Some Brief Opinions On FileSystems On Linux

The Setup & Context

What with my USB Stick dying, I’m recovering that particular “home directory” from backups. I’d been in the middle of splitting it anyway, making what had been about 20 GB of “portable home directory” into 3 smaller bits. Anything that was “just storing files” got moved to general disk storage (i.e. file server or archive). That cut things down to about 3 GB of active “stuff”. That consisted mostly of a few recent files (that is, things downloaded in the last week or two and not moved to real disk), the browser cache (that IIRC I’d set to about 1 GB to minimize internet traffic and lag times), and my mail archive (since Thunderbird et. al. insist on stuffing it into a hidden database in a .directory file.)

So I’d moved all the “just files” off, mostly. My “environment” of personalized commands remained (but is pretty consistent machine to machine anyway) and I’d split the browser image from the email reading image. Then some stuff came up and the project sat at that point for a few months. Then the Monster USB Stick went a bit flaky with a couple of errors, so I made 2 rapid copies and the stick died in the middle of the second one.

Now the fun begins…

Just which image had been the browser one and which was the email target? As I’d not run the email program, and it had been a few months, I sort of lost track (expecting it to be self prompting based on what was on the stick…) Then there’s the fact that that sudden upgrade required from the kernel being shown to have a security flaw and a couple of more bits has my current desktop not showing Thunderbird at all anyway… Sigh.

So tonight I went through and counted up 9 major images of my home directory in storage… There’s at least 2 from before the split, about 9 months ago. A couple from the emergency dump, one short. Another from a few months back, and then a couple from about last December to this February. Most of them from after the split. So tomorrow I get to start sorting out who’s a what. Is the first of the emergency copies good? Is the biggest latest post-split copy from about 5 months back “new enough”? Do I “unscramble the split” before I proceed, or try to just recover the one post split part that failed? Fairly trivial for the simple files. The browser cache I don’t care about. The more tricky bit is the email archive as it is in hidden database files where you can’t easily inspect them.

The “Good news” is that I’d not cleared email from my ISP since before the split (just read it via the web page), so I’ve not lost anything. “Worst case” is I just restore the last pre-split image and then go through the same “split” process but as I go to move, say, a pdf file to the pdf folder in archives discover it’s a duplicate. Repeat for 15 GB…

Hopefully the first emergency copy is complete and a comparison with the prior images will confirm it for the email database.

But Along The Way – File Systems!

So I had to find those images and herd them together. A few dozens of GB scattered over a few disks ( I’d just used what was fast for the ‘quick copies’) I noticed that a couple of the disks were ext4. Some folks might remember my blue streak when I discovered that “new” ext4 would automagically transition your “old” ext4 disk to the new journalling formats and method; but rendered the disk unusable and un-fsck-able on older systems using the “old” ext4 file system. That is just “so wrong”. They ought to have called the “new” ext4 by the name ext5 and given you a choice not to be “converted”. Folks like me who move disks between systems get “converted and stuck”.

So back then I’d converted many of my disks to ext3 (still journalled, but not as fancy and it is stable). So in the last couple of days, while looking for where my USB Stick copies got stored, I converted about 6 TB of misc. disk partitions from EXT4 to EXT3 and that included making a copy of the data “somewhere else”, change the format, and copy back (or consolidate on another disk).

You start to appreciate the meaning of 4 GB disk on USB 2.0 after the first few hours… But as of now my only EXT4 disk remaining is the LVM disk on my nfs server and a 4 GB offline archive disk. The LVM group doesn’t get moved and the archive disk plugs into that system also; so none of them is a risk of an “ext4” incompatibility surprise.

But all this moving and formatting and copying and such had me thinking about File Systems. Since whoever was working on ext4 has shown themselves a bit of an idiot by causing this problem of “2 mutually incompatible ext4 formats”, was it time to move on to something else?

I’ve used LVM and I’m not that keen on it. Yes, it does what it says it will do, but at the expense of arcane commands and great sloth when you change disk configurations. Not interested in more of it (and “someday” will migrate the 8 GB of temperature data and site scrapes off of it, then turn it back into 2 x 4 GB of some other file system.

Other File Systems

Linux seems to sprout new File System choices faster than I can keep up. Many are specialized. Some are best suited to data centers with disk farms of 100+ disks and 24 x 7 staffing. Others are just not quite working yet…

First I eliminated the “special use” file systems. While I am going to use squasfs for some things, it is really just using a compressed file image to make a read-only file system. Similarly, f2fs is intended for lower wear on flash based things like SD cards and USB Sticks. Useful IF I were going back to a USB stick, but not for general disk. After you eliminate that kind of thing, and on Debian / Devuan; the remaining list of “general use” file systems is fairly narrow. I also left out “swap” as it is only for use as a swap partition.

EXT2, EXT3, EXT4 = The “native” Linux file systems. EXT2 is not journaled and is mostly used for boot partitions. EXT3 is a solid workhorse with somewhat smaller limits than EXT4. As I’ve mentioned, EXT4 has “versions” and I’m abandoning it.

FAT16, FAT32, HFAT, NTFS = These are all Microsoft file formats with “issues” that make them a pain for use with Linux / Unix. Mostly that they don’t handle all the normal cases and characters allowed in *Nix file names, but date stamping is also a bit daft.

HFS+ = The Macintosh file system. Works well, but read only on Linux at this time (or at lest last time I checked).

UFS = Used in Unix and BSD, but read only on Linux at this time.

Reiserfs, Reiser4 = An early advanced file system that supposedly works well, but the main developer (Mr. Reiser) has been arrested for the murder of his spouse. Aside from just not wanting to be reminded of a murderer every day, that will put a serious dent in future support and enhancement…

So at that point I arrived at the short list of “other file systems” of:

ZFS
XFS
btrfs

The Short list

Well, ZFS is pretty complicated. Does about everything, but not supported on many of the Linux varieties in my set of chips. Intended for major data center type work. It was originally developed for / by Sun Microsystems, and it works well, but has their usual complexity issues.

BTRFS (‘butter fs’) “has issues’

https://btrfs.wiki.kernel.org/index.php/Gotchas

Gotchas

This page lists some problems one might face when trying btrfs, some of these are not really bugs, but rather inconveniences about things not yet implemented, or yet undocumented design decisions.

The page references issues relevant for a few stable kernel releases that seem to be in use. This currently contains 4.14, 4.9 and 4.4. The list will be updated once a new stable is announced.

Please note this just to document known issues, ask your kernel vendor for backporting fixes or support.

Please note that most of this page was not written by the btrfs developer community and may be entirely inaccurate.

Affecting all versions
Block-level copies of devices

Do NOT

make a block-level copy of a Btrfs filesystem to another block device…
use LVM snapshots, or any other kind of block level snapshots…
turn a copy of a filesystem that is stored in a file into a block device with the loopback driver…

… and then try to mount either the original or the snapshot while both are visible to the same kernel.
[…]
Fragmentation

Files with a lot of random writes can become heavily fragmented (10000+ extents) causing thrashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM.
On servers and workstations this affects databases and virtual machine images.
The nodatacow mount option may be of use here, with associated gotchas.
On desktops this primarily affects application databases (including Firefox and Chromium profiles, GNOME Zeitgeist, Ubuntu Desktop Couch, Banshee, and Evolution’s datastore.)
Workarounds include manually defragmenting your home directory using btrfs fi defragment. Auto-defragment (mount option autodefrag) should solve this problem in 3.0.
Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files (may not work correctly with compression).
[…]
Parity RAID

Currently raid5 and raid6 profiles have flaws that make it strongly not recommended as per the Status page.
In less recent releases the parity of resynchronized blocks was not calculated correctly, this has been fixed in recent releases (TBD).
If a crash happens while a raid5/raid6 volume is being written this can result in a “transid” mismatch as in transid verify failed.
The resulting corruption cannot be currently fixed.
[…]
For stable kernel versions 4.14.x, 4.9.x, 4.4.x
Having many subvolumes can be very slow

The cost of several operations, including currently balance, device delete and fs resize (shrinking), is proportional to the number of subvolumes, including snapshots, and (slightly super-linearly) the number of extents in the subvolumes.
[…]
Conversion from ext4 may not be undoable

In kernels 4.0+: the empty block groups are reclaimed automatically that can affect the following:
a converted filesystem may not be able to do a rollback because of the removed block groups

Then in some formus folks were telling stories of having unrecoverable data loss without explanation or other failure modes. This after several years of development and then several more in production / enhancement.

In short, it’s a complicated and still somewhat unstable beast with “issues”.

Oh, and Red Hat is dropping support for it from 7.x forward. Too much maintenance work and the guy who was doing it quit.

So not going with BTRFS…

Which leave us at XFS.

XFS came out of SGI in the ’90s. It is relatively mature, not too complicated, and has few bugs. The features are more than enough and it is designed for speed and efficiency. I don’t know how much of that will apply to a few slow USB disks, though…

In Conclusion

So I’m going to continue to use EXT3 for everything I can; but “going forward” will be getting my feet wet in XFS. Some Linux vendors are using it as the default in their new releases and it does look to be gaining acceptance for wide use.

EXT3 is quite enough for just about everything I do, so I don’t feel a big loss by dropping EXT4 and using it.

XFS may be more complicated to set up than I like, or maybe the defaults will be fine. I’ll find out in a few weeks. IF I like it, I’ll make a trial build of a system using it and then live on it for a few weeks. Do note that some Linux releases can NOT boot from an XFS file system as their boot loader can’t read it. Red Hat is making theirs able to read a native XFS file system, but for now, keep your /boot as EXT2 or EXT3.

There are a whole lot more file systems you can put on Linux, but most of them don’t have much to offer, or are highly specialized (often for giant disk farms). There’s a list of file systems and details here (Not all on Linux):

https://en.wikipedia.org/wiki/Comparison_of_file_systems

As of now, I’ve moved all the TB that needed moving, everything on just about everything but the NFS server is EXT3 and compatible between any Linux versions I’ve got running. I’m going to make a test case XFS file system and compare speed and complexity on the Raspberry Pi. I’m in no hurry to change, though, so this is purely a “slow day play” operation.

Other than that, the rest of the “general purpose” file systems will no longer require my notice…

For a description of XFS and it’s features, this is a pretty readable writeup:

https://www.linux.org/threads/xfs-file-system.8823/

[…]
The file system consistency is guaranteed by the use of Journaling. The Journal size is calculated by the partition size used to make the XFS file system. If a crash occurs, the Journal can be used to ‘redo’ the transactions. When the file system is mounted and a previous crash is detected, the recovery of the items in the Journal is automatic.

For support of faster throughput, Allocation Groups are used. Allocation Groups provide for simultaneous I/O by multiple application threads at once. The ability allows systems with multiple processors or multi-core processors to provide better throughput with the file system. These benefits are better when the XFS file system spans multiple devices.

For multiple devices to be used within the same XFS file system, RAID 0 can be implemented. The more devices used in the striped file system, the higher the throughput can be achieved.

To also provide higher throughput, XFS uses Direct I/O to allow data retrieved from the XFS file system to go directly to the application memory space. Since the data is bypassing cache and the processor, the retrieval and writing of data is faster.
[…]

I’ll likely start with a 1 TB stand alone disk used for scratch space and “copy out / copy back” file moves and unpacking. Like, oh, unpacking the backups of my USB Stick and sorting through them ;-)

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , , . Bookmark the permalink.

12 Responses to Some Brief Opinions On FileSystems On Linux

  1. ossqss says:

    I never had a memory stick die on me before. I did have a few with dirty/corroded contacts, but they worked after some attempted clean up. I primarily use and adapter with micro memory chips inserted now days however. What kind of stick was it EM?

  2. E.M.Smith says:

    @ossqss:

    I’ve killed a few of them in my time. Doing things like installing a LINUX on one (with things like swap on the stick…) and / or putting a highly used web cache on them and then browsing a lot. Just a whole lot of churn of the bits and it eventually has the wear rate use them up.

    This one lasted longer than most in that sort of use. It is “Monster” brand and I quite liked it. I had a cheap one of no-name brand start throwing errors about 3 weeks after I installed a running Linux on it. Basically constantly wearing the storage ;-) “It’s not abuse! I’m doing QA Testing! -E.M.Smith” ;-)

    So this one went several years. Not sure exactly how many. 3? 4? As my “daily driver” home directory that moved from system to system with me. Plus I’d use it to move GB lumps of data, so every so often I’d fill it with a blob on one system then move it to another and unload. Assuring every bit got used…

    The Mac I’m using right now had the SSD die. We’re talking a several hundred $$ high end thing. But the Mac OS is very “chatty” to “disk”, and the wear rate catches up. It’s about 7? years old, so about 6 when the SSD “wore out”. With more constant use it could go in 1/2 that. (The spouse was not on it all that much per day… Light to moderate user).

    That’s the big thing nobody talks about with NAND / Flash based systems. Bit wear WILL eventually consume your SSD, Stick, SD card, whatever. Just a question of how fast you write it.

    Used to be about 10,000 cycles. (when I killed one in a couple of weeks) Now it’s been pushed out to about 100,000 cycles. But it is NEVER infinite cycles… So buying a new “Monster” brand stick it will last longer than the last one. And yes, I’m likely to do that. Why? The minimal size and metal exterior. Besides, it comes in colors ;-)

  3. E.M.Smith says:

    I probably ought to mention that the problem of flash NAND “wear” is so high that the f2fs file system was created just to reduce it on flash media… I’ve not used it yet, but that is the purpose. I’ve been using journalling file systems that double the writes (once to the journal then once to commit…).

    New Install of Devuan 2.0 “ASCII” release:

    Well, I’ve completed a successful rolling forward onto Devuan 2.0 ASCII release. It was not as simple at it ought to be. I’ll put up a posting some time tomorrow about it, right now it’s late and I’m tired…

    THE basic big issue was that the “copy .img to SD card then grow the file system” fails on ext4 being the “new one” that is not compatible with the “old” fsck; so gparted (which does an fsck of sorts) croaks on it. Would not be that bad, but for the fact that the system is under 2 GB so too small to install a GUI and gparted that does understand the filesystem. Your choices are either to resize long hand via CLI or “so something else”…

    Basically, once you have installed the “new one” you can expand the file system size so that you can install the new one… Aaaarggghh!

    I got around it by tarring off the installed stuff, removing the ext4 partition, making a new bigger ext3 partition, then tarring the stuff back in; all done on an older Jessie release. Now that I have a working 2.0 ASCII release with the newer gparted I can just use IT to grow the file system on additional new installs…

    I decided to try the 64 bit version again. it still has the same issues. As the instructions are 64 bit instead of 32 (and likely the data structures too) it uses more memory. I’ve got 3 tabs open (one showing my router status, not a big page, another just a small ‘how to” on making a repository, and this one) and it’s used all Gig of memory and put 11.2 MB on swap already. Sheesh.

    Generally, it also seem slower (as you wait longer for more word volume to be read in from memory and such) and it still has the “video compositing” issue where dragging a window is using the CPU not the video processor so is slow, jittery, and full of artifacts until you let it stop and rest a minute to catch up. So, OK, I need to do this all over again but for the armhf 32 bit and use it instead. But at least for now I can use this one to resize the next ones…

    Maybe tomorrow…

  4. philjourdan says:

    Hmmmm – using it as the swap! That is a good idea (but keep it only swap space). It is faster (with USB 3) and the sticks are cheap!

    I like that! Thanks for the idea!

  5. Steve C says:

    How do the various memory card/stick formats compare in terms of lifespan? I’ve had a circuit lying around for years showing how to wire a CF-card socket direct to a 40-pin header to make a DIY SSD, kept with the vague intention of finding a CF socket and sprucing up one of my older machines. (There would also be interesting possibilities for some PIC experimentation later, as ATA is easy to talk to.)

    I know all the formats can be made to look like a hard drive with a bit of driver software, the appeal of the CF conversion being that no special software is needed, and only a minimum of hardware. But if the thing is likely to fall over after a few months, I’d rather know in advance & save the money, or try and re-think in terms of MMC, or SD, or whatever. All tales of particularly impressive or appalling experiences read with interest!

  6. E.M.Smith says:

    @SteveC:

    You are asking for stories about crossing several moving rivers twice….

    First off, NAND is advancing and lifetimes constantly get longer. Any story from 5 or 10 years ago just is not applicable to today.

    Second, the different kinds of flash storage are designed for different goals and even different makers will have different results (and all of those will change over time as models evolve).

    So take the PNY SD chip I tested. Every SD card has a tiny little processor in it. 4 bit IIRC. It runs Linux. To spice up their write performance spec, they added a chunk of fast buffer to it. BUT, write a big enough chunk of data to it, that fills up and the slow spill to backing store flash shows up as slower write times. The write speed of a 10 MB file is vastly different from that of a 1 GB file. Other storage doesn’t have that “feature”.

    SSD cards are designed with MUCH more parallel write ability and smaller write units (so less wear over time). USB sticks and SD cards are designed to have larger blocks, so if you write one byte, it will read some large chunk (call it a MB) into buffer, change the byte, and write the MB back out. Higher wear rate with small transactions and much slower write times… They are fundamentally different beasts

    I don’t know how CF behaves internally. IF you can avoid that “re-write large chunks for a byte” problem it could be OK.

    The theoretical aspect of each design likely matters more than old war stories. (That said, old war stories can still be fun…)

  7. ossqss says:

    Steve, there are many adapter variants for the CF cards. CF to IDE or SATA etc.. i use a USB multi-memory card adapter that actually has a CF slot on it. It is old and probably a USB 1.1, so might not be very good for your use.

    Here is an IDE unit

  8. Steve C says:

    Thanks for the info! As I suspected, it looks more a case of not being able to step into the same river once! I’m well aware of how things have changed; my old Nascom-1 memory consisted of eight 1K x 1 bit SRAM chips, my largest USB memory stick (64GB) is also the physically smallest, to the extent that it’s a bit of a PITA to extract. Fortunately (?) the passage of time will probably see any of my proposed projects moving slowly towards the “back burner” – it seems that most of my practical abilities in recent years have involved not much more than making up cables …

  9. Thanks for the tip. I tend to be a bit more organized than that; I always have the same Linux operating system running on different computers so I don’t have problems with transferring files. But I’ve never had a problem with transferring files between the ext4 filesystem of Linux and the NTFS filesystem of Windows. Great article.

  10. E.M.Smith says:

    And interesting and unexpected result of testing ext3 vs xfs file systems on an SD card (same card, two file systems.

    Copy of an archived home directory copy from the NFS file server to ext3:

    real	19m0.845s
    user	0m7.167s
    sys	2m3.899s
    

    The the one I thought would be slower, is faster, the same copy but to an xfs partition:

    real	12m27.851s
    user	0m8.058s
    sys	1m49.433s
    

    Uses less system compute time and finishes 7 minutes sooner…

  11. AwethentiQ says:

    I had 8 Sandisks die on me in just one year and a brand new Strontium 32GB that caused me tremendous heartache. In its first day of use during a once-in-a-lifetime journey.

  12. E.M.Smith says:

    Sandisk has a known “mis-feature” that if they detect an “error” they blow their brains out and go to ‘read only’. I had a brand new 128 GB USB Thumb Drive do that on the 2nd or 3rd hour of use. I did a reformat to a different file system and something in the processes convinced it it was a Bad Bad Boy so ceased to function. Would have been marginally OK had there been something on it I wanted to archive; but an empty new file system that is locked is essentially worthless.

    After a bit of Google-Foo found plenty of other horror stories and a link to the SanDisk web page that called it a feature…

    I no longer buy SanDisk. Samsung works great and seems faster to me too. Zero failures in a couple of years use, so far.

    I’ve reformatted the Sony and have a working OS on it. It would seem that the failure to have the MBR / boot blocks come out right is in fact a failure of the image of an SD as posted. My interpreting it as (variously) the SD card or the adapter is likely in error (though the one adapter may still be partly an issue. I’ve had it alternately work, then maybe not, in testing.

    FWIW, I’ve got something like a dozen mini-SD cards from SanDisk and not one has failed yet. I don’t know if the SD has a different behaviour than the USB stick; but I’m not taking any chances on new ones.

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.