Using Loop and Sparse File for EXT on NTFS File System

I’ve used ‘loopback’ file systems before. I’ve used file systems based in a ‘sparse file’ before. Many Virtual Machines put their data in a file system built inside a sparse file, for example. I just never bothered to ‘roll my own’ before.

A sparse file is one that looks like a big size, but only really has had data blocks used for the actual bits with data in them. So you can say it will be 5 GB, but if you only put 1 GB in it, it will only use the 1 GB. So for a file system in a file container, you can build all the inodes (information nodes) and all the meta data structures as though the file were 5 GB of space and have them ‘scattered’ through that space, but all the ‘data blocks’ are empty and unused until some real data shows up.

Turns out, it can be a feature…

For one thing, you can have your files in a file system that is encrypted, inside this file that looks like it is giant, but is only as big as it needs to be. Such a ‘file container with encryption’ is one of the benefits of things like TrueCrypt. Yes, to be “NSA proof” you would need the whole OS hardened and encrypting, but for “Barney Fife” secure, it’s way more than enough.

I’m going to walk through the entire process of adding the disk, making the sparse file, and then mounting that as a newly made EXT3 file system. I already had an entry in my /etc/fstab for mounting the Seagate drive.

/dev/sdc1       /SG             ntfs-3g defaults          0       1

So remember that you need to have somehow mounted the drive and / or put an entry in /etc/fstab for it and issue the mount command.

root@RaPiM2:/WD# mount /SG
root@RaPiM2:/WD# df
Filesystem     1K-blocks      Used Available Use% Mounted on
rootfs          59805812  55000212   1744560  97% /
[...]
/dev/sdc1      488384000 330276856 158107144  68% /SG

That rootfs / is the 64 GB card on the RaPiM2 and is rapidly filling up with the GHCN data. Now over 50 GB and with only 1.7 GB left before it locks up the whole system by filling the ‘root disk’. Before then, I need to re-point that wget download to a different location. I’d rather not swap if from expecting an EXT3 file system to NTFS. I don’t think anything would break, but with a couple of days invested in this, I’d rather not ‘risk it’ on a pause / restart of the process. Besides, EXT is the native Linux file system and I just like it better. ;-) And I have to think Linux likes it better too.

But what I have is an NTFS file system on a disk that’s a pain to resize… So I’m going to stuff that EXT file system inside a ‘bag of bits’ handed to me by the NTFS driver from the NTFS disk that can easily hold a single large TB scale file, while avoiding all the NTFS way of handling metadata for all those Linux / Unix world files.

You an see from the df output that I have now available 158 GB of NTFS space on /SG. So let’s ‘go there’ and make a container. I could just use a program like ‘dd’ to make a 150 GB file full of blanks and use it, but then again, I don’t really know how big GHCN is going to get. It would be a bit stupid to use 150 GB if the process is going to use 10 more and be done. So I’m going to make a ‘sparse file’ instead. This too can be done with ‘dd’, but I’m going to use the ‘truncate’ command. It can shorten a file, or make it bigger than it seems…

root@RaPiM2:/WD# cd /SG

root@RaPiM2:/SG# truncate -s 150G GHCN_filesys
root@RaPiM2:/SG# ls -l

drwxrwxrwx 1 root root            0 Sep  2  2010 Administrator_Backup
drwxrwxrwx 1 root root        20480 Jun 28 09:02 Evo
-rwxrwxrwx 1 root root 161061273600 Sep  7 17:25 GHCN_filesys
drwxrwxrwx 1 root root            0 May 13  2011 _Memeo
drwxrwxrwx 1 root root            0 May  3  2012 $RECYCLE.BIN
drwxrwxrwx 1 root root            0 Oct 21  2010 RECYCLER
drwxrwxrwx 1 root root         4096 May  3  2012 System Volume Information
root@RaPiM2:/SG# df .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sdc1      488384000 330276856 158107144  68% /SG

I’ve chopped a few bits out of the ls listing. You can see here that it says GHCN_filesys is 161 GB (note that since a KB is really 1024, the meaning of a MB can wander between 1,000,000 bytes to 1024 x 1024 bytes… depending on user and context – base ten or binary. Don’t let that bother you…)

Yet our ‘used space’ hasn’t changed per the ‘df’ command. Neat. (Just be aware that some programs are not so bright about this and using things like ‘tar’ might end up mysteriously creating 150 GB after a move / copy…)

root@RaPiM2:/SG# du -h --apparent-size GHCN_filesys 
150G	GHCN_filesys

root@RaPiM2:/SG# du -h GHCN_filesys 
0	GHCN_filesys

So it looks like 150 GB apparent per the ‘du’ disk used program, but it really is empty.

Now lets make an EXT3 Linux native journaling file system inside that container. Since I know a lot of the files in that GHCN copy are large compressed files, I’m going to give it a large ‘block size’ to keep the overhead of tracking blocks down just a little. That -b 4096 says to make 4k sized blocks. If I had a use with a gazillion tiny 100 to 500 byte files, I’d make it a 1 k or 512 block size instead and save myself wasting (4096 – 512) bytes per block.

Note that on the 3rd line down mkfs.ext3 notices that I have not given it a real disk on a real ‘block special device’ and asks me to say ‘y’ before it goes on. It then complains that there isn’t a real disk geometry so can’t get that data; which we already knew, so can ignore. I’ve bolded the question.

root@RaPiM2:/SG# mkfs.ext3 -b 4096 GHCN_filesys 
mke2fs 1.42.5 (29-Jul-2012)
GHCN_filesys is not a block special device.
Proceed anyway? (y,n) y
warning: Unable to get device geometry for GHCN_filesys
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
9830400 inodes, 39321600 blocks
1966080 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
1200 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done     

It took a few minutes to do the actual creation as EXT3 preallocates and writes all the ‘inodes’ or information nodes that hold all the information about data blocks and the metadata about files. That stuff you see in ‘ls’ listings like name, permissions, size…

Doing the mkfs-ext3 took about 88% of one core during the “Writing inode tables” part for the ntfs-g3 driver. You still get all the lovely inefficiency and CPU usage of NTFS, but you avoid all the various other bits of how NTFS keeps metadata and all that for your Linux file system that now is built inside that NTFS box. All NTFS has to do is glue on more blocks as the container grows and asks for them.

So here, in about 6 minutes all told, I went from a very large NTFS file system on a disk with left over free space, to a usable mounted (if less efficient) EXT3 file system. Far far faster than that 3 day ntfsresize… And, since the use of this ‘partition’ is limited to the speed of the internet download for most uses, and since I have an idle CPU core almost all the time, the inefficiency just doesn’t matter much.

Had we built one of the file system types with dynamic inodes, this space would not be allocated until data is written, so more space efficient, but with more writes later. IIRC, ReiserFS and EXT4 are that way. I’m a bit more fond of EXT3 as it is compatible with EXT2 so I can swap to a non-journaling (lower write load) file system if desired. Eventually I’ll get around to using all the complicated features of things like btrfs, but that’s way overkill for this immediate need.

So if we inspect file size again, we see:

root@RaPiM2:/SG# ls -l GHCN_filesys 
-rwxrwxrwx 1 root root 161061273600 Sep  7 17:54 GHCN_filesys

root@RaPiM2:/SG# du -h GHCN_filesys 
2.5G	GHCN_filesys

root@RaPiM2:/SG# df .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sdc1      488384000 332880068 155503932  69% /SG

root@RaPiM2:/SG# du -h --apparent-size GHCN_filesys 
150G	GHCN_filesys

It really is using 2.5 GB for all those inodes and stuff. It looks like 150 G to both an ls and a du with apparent set.

Next we mount it to the ‘name space’ so it looks like a real file system. I’ll start by making the /GHCN directory as a mount point. Then the “loopback” interface is used in the mount command. This lets us use the loop drivers to get to the file system inside the file.

root@RaPiM2:/SG# mkdir /GHCN
root@RaPiM2:/SG# mount -o loop GHCN_filesys /GHCN

Now what do we see on a ‘df’ listing?

root@RaPiM2:/SG# df 
Filesystem     1K-blocks      Used Available Use% Mounted on
rootfs          59805812  55097532   1647240  98% /
/dev/root       59805812  55097532   1647240  98% /
devtmpfs          470416         0    470416   0% /dev
tmpfs              94944       396     94548   1% /run
tmpfs               5120         0      5120   0% /run/lock
tmpfs             189880         0    189880   0% /run/shm
/dev/mmcblk0p6     61302     57554      3748  94% /boot
/dev/mmcblk0p5    499656       676    462284   1% /media/data
/dev/mmcblk0p3     27633       444     24896   2% /media/SETTINGS
[...]
/dev/sdc1      488384000 332880068 155503932  69% /SG
/dev/loop0     154687468     60996 146762152   1% /GHCN

A somewhat fuller rootfs or “/” (as that wget is still running filling it up…), the original /SG mount with 155 GB still free, and what sure looks like a 146 GB free space on a files system named /GHCN mounted on /dev/loop0.

At this point, I can ‘pause’ my GHCN wget, move the data off the SD card to /GHCN, put a symbolic link from the old /MIRRORS location to /GHCN, and then ‘fg’ bring the job back to running in the foreground and “move on”.

FWIW, I also have the 140 GB (after resize / format / etc) of EXT3 file system built on the older Toshiba drive mounted and the CDIAC wget running against it, too.

/dev/sdd3      140745608  43953180  89636308  33% /Temps

So I’m back to normal running, more or less, with nearly 300 GB more space, half as a real EXT3 disk partition that took 3 days to get up and attached, the other half as EXT3 inside an NTFS disk file container that took about 10 minutes including reading web page HowTo…

Later I’m going to do some speed tests to see what kind of penalty there is to this method, but right now the whole I/O system is being very saturated by the wget so any numbers would be contaminated and not very informative.

In Conclusion

I searched several web pages on “how to” do this. The model I chose to follow was this very well written one from the Arch Linux folks. It also covers some of the details on how to copy / move such a file without hitting the ‘sudden size’ issue.

https://wiki.archlinux.org/index.php/Sparse_file

This link is similar, and has information on setting it up as an encrypting file system and using ACLs Access Control Lists.

http://linuxgazette.net/109/chirico.html

It has more details on using “dd” to build the file along with how to set it up as an encrypted container. I’ll likely do that as a test ‘some other day’ when not in a race condition with wget… But I think it is pretty clear that just putting your ‘stuff’ in a non-PC file system hidden as an encrypted file system in a big bag of bits with a name like “failed_binary_image” on an NTFS drive would get it past all but the more careful of forensics folks.

For now, though, I’m down to 1.3 GB on my root partition on the SD card and need to get busy moving 50+ GB to my new space… Performance testing and ACLs / encryption can be for another day.

UPDATE: About 5 minutes later…

Well, no sooner than I was done posting and ready to do the “pause / move / restart” whenever free space in rootfs fell under 1 GB… and the “GHCN” wget finished. Note that this is the ‘restart’ run after the original one had gone a couple of days already… I’ve clipped a bit from the bottom of the listing and bolded the stats portion:

--2015-09-07 19:43:11--  ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical%20Report%20NCDC%20No12-02-3.2.0-29Aug12.pdf
           => `ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical Report NCDC No12-02-3.2.0-29Aug12.pdf'
==> CWD not required.
==> PASV ... done.    ==> RETR Technical Report NCDC No12-02-3.2.0-29Aug12.pdf ... done.
Length: 2343758 (2.2M)

100%[================================================================================>] 2,343,758    132K/s   in 14s     

2015-09-07 19:43:25 (169 KB/s) - `ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical Report NCDC No12-02-3.2.0-29Aug12.pdf' saved [2343758]

FINISHED --2015-09-07 19:43:25--
Total wall clock time: 1d 18h 14m 28s
Downloaded: 30334 files, 30G in 1d 13h 10m 51s (237 KB/s)

Never has “FINISHED” looked quite so good…

All told, the download (both parts) amounted to about 51 GB of ‘stuff’:

pi@RaPiM2 ~/ftp.ncdc.noaa.gov/pub/data $ du -ks *
51182296	ghcn

So now I can put it somewhere a bit more permanent than the SD card, and never need worry about going back to the NOAA well again unless I want to do an “update”. Most likely then it would still be only a few GB of “daily data” that changed and would be re-downloaded.

Inside the ghcn directory, here are the sizes:

pi@RaPiM2 ~/ftp.ncdc.noaa.gov/pub/data/ghcn $ du -ks *
12	alaska-temperature-anomalies.txt
4	alaska-temperature-means.txt
2452	anom
117412	blended
48933468	daily
30116	forts
14604	grid_gpcp_1979-2002.dat
3796	Lawrimore-ISTI-30Nov11.ppt
1492	snow
3584	v1
62224	v2
2013128	v3

Clearly it is that ‘daily’ archive that is the largest bit, and that is likely to be in chunks where only some copies change. Like the “by year” where only recent years ought to change.

pi@RaPiM2 ~/ftp.ncdc.noaa.gov/pub/data/ghcn/daily $ du -ks *
25522240	all
13950108	by_year
36	COOPDaily_announcement_042011.doc
124	COOPDaily_announcement_042011.pdf
68	COOPDaily_announcement_042011.rtf
7872	figures
326224	ghcnd_all.tar.gz
4	ghcnd-countries.txt
139556	ghcnd_gsn.tar.gz
281244	ghcnd_hcn.tar.gz
25676	ghcnd-inventory.txt
4	ghcnd-states.txt
8236	ghcnd-stations.txt
4	ghcnd-version.txt
5327224	grid
885188	gsn
2451972	hcn
7628	papers
24	readme.txt
32	status.txt

But all that kind of ‘size listings’ and inventory of just what all is in here really belongs in its own posting… that will come on ‘another day’…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , , . Bookmark the permalink.

12 Responses to Using Loop and Sparse File for EXT on NTFS File System

  1. E.M.Smith says:

    Well, as a first “performance” data point on the file system, the NTFS part is deadly slow.

    I’ve been ‘restoring’ a 30 GB compressed ‘tar.gz’ file of the NOAA wget up to the point where I stuffed it aside from disk shortage issues. I started at about “noon-thirty”. It is now about 4 hours later and I’ve gotten all of 2.6 GB restored. At this rate, it will be a couple of days…

    So “what’s the problem”? Well, it isn’t the EXT3 file system. That’s quite fast. It isn’t the uncompressing nor anything else. No, the problem is that mount.ntfs-3g (the NTFS file system driver) is pegging one CPU (core) at full tilt boogie 99.5% to 100% all the time.

    Now I know why the partition resize took 3 days. The NTFS file system driver sucks. Probably written by some guy who was trained to use Object Oriented coding with ‘methods’ and such who has never learned that OO means FAT and SLOW. Real Code ™ is written in assembler or C, not C++ or worse C#… Oh Well… at least it is ‘free’…

    Well, OK, 2 days is less then 3 or 4… but still…

    Once the restore is complete, it will likely be adequate for continuing the download; but for any ‘real use’ NTFS on Linux is a real tar pit.

    Note To Self: As soon as reasonable and practical, format one of the Toshiba USB drives to native EXT3 and just dump everything you can into it. Once the other drives are emptied reformat them, too. Leave one as NTFS just in case you need ‘share’ things with some archaic Windoz box, but otherwise, just “Run Away!!!!”…

  2. pg sharrow says:

    just “Run Away!!!!”…
    Sounds like a good long term plan…pg

  3. E.M.Smith says:

    Well, I found out how to speed up the NTFS disk from glacial slow to just slow. Mount it with “big_writes” option. Seems that by default the ntfs-3g driver uses 4k writes. Each block, written one at a time. The “big_writes” option lets it run up to 128k writes, or 32 x more per write.

    Then, it seems, a sparse file system causes enough delay between writes that you end up waiting for disk rotation and / or seeks before the next write. This sort of makes sense since each write will have an ext3 journal entry (causing an NTFS write) then a data block write elsewhere on the disk (causing another NTFS write) etc etc. Oh, and NTFS being journalling, each NTFS write really has a journal write, data write, journal write…

    So when you ‘add a block’ in the sparse file system, it’s taking about 9 actual NTFS writes to get one 4k data block onto disk… Going to pre-allocated takes out a bunch of them (as the block already exists so all those writes are gone) and using big_writes lets you slam in 128k for each journal entry.

    The /etc/fstab entry looks like this:

    /dev/sdc1       /SG             ntfs  	rw,noatime,big_writes,uid=1000  0       0
    

    The “uid=1000” just makes all the stuff in the NTFS partition ‘owned’ by user “pi”. Since the NTFS owner is mapped to whoever mounted the drive, I decided to make that pi instead of root. The key bits are the ‘noatime’ (so each file access doesn’t cause an update of ‘last access time’ taking out a bunch more writes of inodes – information nodes or metadata – blocks) and big_writes discussed above.

    I then made a 66 GB file of actual zeros, and built an ext3 file system on it.

    As of now, the “zcat NOAA.tarfile.gz | (cd /NTFSext/filesystem/directory; tar -xtkf -)” command is flying along having restored as much in 5 minutes as the prior one had done in a day… with mount-ntfs-3g using about 75% to 80% of one core, not pegged at 100%.

    I don’t know why mount-ntfs-3g has so much CPU suckage from a change of allocation / writes, but it does.

    Oh, one other fun bit… Want to drive your SWAP usage to insane levels? Use a “dd” command like this to write giant blocks of data:

    dd if=/dev/zero of=/target/file bs=1G count=66
    

    Seems like it makes a giant buffer of that ‘zero’ input stream and assembles the 1 GB of it in memory… that spills to swap, then hands that over to be written and starts making another, which is faster than writing the first one… so goes to swap…

    I’d launched this and then took a nap… came back to find a system with about 3 GB of swap in use… on the same disk as was getting the NTFS writes… and the whole thing sporadically pausing for long periods of time on SWAP contention issues. Pausing the dd let me add swap on another disk and things got much better…

    But, if you ever want to exercise swap, now you know how to test it….

    In some ways I enjoyed doing QA a lot. Over time you collect a giant set of “machine abuse” commands like that which had some unexpected behaviour on one release or another. Though it is called a ‘regression test suite” and not “machine abuse collection” ;-)

    FWIW, while playing with this file system behaviour, I’ve had the downloads running against a restore into that 150 GB real ext3 file system. It still has 38 GB free (most of the used space being not the NOAA data), and I’m ever hopeful that the download will finish before that one partition fills up; but if it gets ‘close’, I’ll have a well working place to put it. IFF if finishes ‘soon’, I’ll likely just bleed it into the sparse file system. But if it ends up taking a big chunk of that space, then it will go into the large pre-allocated filesystem. (basically, if significantly under 66 GB, it will go to the sparse one and I’ll just let it take time to fill. If it goes near 66, it will take the pre-allocated one. If over 66, I’ll move the other thing on that partition ;-)

    BTW, this is a fairly ordinary example of how systems-admin works. You have a problem. Learn some new tricks that might solve it (after using up old bag of tricks), have the “Oh Boy, This Time For Sure!” hubris moment. Try it, and things look good, until they don’t. Go dumpster diving in the manuals and on web pages and formulate a modified strategy. Test THAT. Repeat until something works or you have run out of schedule… Then re-fix it (or explain need for more time, budget, or coffee…) and move on.

    So now my “bag of tricks” has grown by “sparse files” (including how they abuse I/O on NTFS with the present drivers), mounting “loop” file systems, making a file system of one kind inside a file of another kind, the tuning of NTFS file system mount options (many I’ve not mentioned in this comment but have learned anyway – the number of ‘permissions’ related ones is way too large ;-) and performance testing of all of the same. Oh, and a fun way to drive swap usage to insane levels if desired… On ‘some other day’ I’ll test what happens as you approach full swap (but since I had 6 GB configured on my 1 GB real memory machine I didn’t get close to that…) and what happens in small swap settings. i.e. will dd just die with a 4GB block size and 1 GB memory (no swap) or what?

    It is running into these “strange and unexpected things” that both keep the job interesting and make it very frustrating at times. (Especially when a client has asked “how soon done?” and you have made a ‘good guess’ of “about 9 hours” then run into one of these “plus unexpected 4 days from software sloth / bug” issues…)

    Oh well… Good thing I enjoy this kind of stuff ;-)

  4. beng135 says:

    Thanks, EM — I’ll mess around w/your NTFS big-write option. I see you’re using noatime, so you’ve already improved the speed w/that option.

  5. Ben says:

    I have an off topic request.

    I do not know R.

    I know that R can do what I am looking to do.

    I am looking to monte carol a series of variable to get an array of history matches. Out put would be a grid of R^2 values and other accuracy measures and a series of plots of the matches over some threshold of quality.

    These “quality” match parameters would then be used to history match other samples to eventually work down to a sub set of 3 or 4 that are the best fit in all cases.

    Anyone who has an R package close to this that could be adapted or has some guidance…..

  6. Pingback: Well That Was Fun, sort of… | Musings from the Chiefio

  7. E.M.Smith says:

    @Ben:

    What is the input? Nice to know what output you desire, but the other half is turning the input into something usable…

  8. Ben says:

    Input data is pseudo steady state natural gas production from a shale reservoir. Variables are permeability porosity fracture half length etc.

    By degree I am an environmental geologist. But have been doing reservoir work for e&p’s for the last few years.

  9. E.M.Smith says:

    So a data set like:
    Reservoir Name & ID#, production in th cuft per unit time, perm, fracture, etc over time

    then search set variations of parameters to find what makes matching curves over time with R^2 of fit and related fits, then plot the best matches againt real for eyeball QA.

    hmmmmmm… interesting, but likely needs an R programmer. I’ve just started to learn R so not good enough yet. I’d likely start with just eyeballing some graphs and looking for patterns. Maybe group by quartile first.

  10. Ben says:

    Thank E.M.

    Yep we feed in hydrocarbon rates and pressures and then we use an equation that should be able to model the rates and pressures using the reservoir characteristics like perm.

    The problem is they are all unknowns beyond order of magnitude ranges non unique solutions abound.

    The idea is to treat different intervals of time across different wells as separate samples and try to find a sub set that satisfies all samples.

  11. E.M.Smith says:

    @Ben:

    So, by “Thanks” do I assume “that was enough” or “keep on looking” ???

    I’m happy to help with what I can, but need to have direction…

Comments are closed.