The Setup & Context
What with my USB Stick dying, I’m recovering that particular “home directory” from backups. I’d been in the middle of splitting it anyway, making what had been about 20 GB of “portable home directory” into 3 smaller bits. Anything that was “just storing files” got moved to general disk storage (i.e. file server or archive). That cut things down to about 3 GB of active “stuff”. That consisted mostly of a few recent files (that is, things downloaded in the last week or two and not moved to real disk), the browser cache (that IIRC I’d set to about 1 GB to minimize internet traffic and lag times), and my mail archive (since Thunderbird et. al. insist on stuffing it into a hidden database in a .directory file.)
So I’d moved all the “just files” off, mostly. My “environment” of personalized commands remained (but is pretty consistent machine to machine anyway) and I’d split the browser image from the email reading image. Then some stuff came up and the project sat at that point for a few months. Then the Monster USB Stick went a bit flaky with a couple of errors, so I made 2 rapid copies and the stick died in the middle of the second one.
Now the fun begins…
Just which image had been the browser one and which was the email target? As I’d not run the email program, and it had been a few months, I sort of lost track (expecting it to be self prompting based on what was on the stick…) Then there’s the fact that that sudden upgrade required from the kernel being shown to have a security flaw and a couple of more bits has my current desktop not showing Thunderbird at all anyway… Sigh.
So tonight I went through and counted up 9 major images of my home directory in storage… There’s at least 2 from before the split, about 9 months ago. A couple from the emergency dump, one short. Another from a few months back, and then a couple from about last December to this February. Most of them from after the split. So tomorrow I get to start sorting out who’s a what. Is the first of the emergency copies good? Is the biggest latest post-split copy from about 5 months back “new enough”? Do I “unscramble the split” before I proceed, or try to just recover the one post split part that failed? Fairly trivial for the simple files. The browser cache I don’t care about. The more tricky bit is the email archive as it is in hidden database files where you can’t easily inspect them.
The “Good news” is that I’d not cleared email from my ISP since before the split (just read it via the web page), so I’ve not lost anything. “Worst case” is I just restore the last pre-split image and then go through the same “split” process but as I go to move, say, a pdf file to the pdf folder in archives discover it’s a duplicate. Repeat for 15 GB…
Hopefully the first emergency copy is complete and a comparison with the prior images will confirm it for the email database.
But Along The Way – File Systems!
So I had to find those images and herd them together. A few dozens of GB scattered over a few disks ( I’d just used what was fast for the ‘quick copies’) I noticed that a couple of the disks were ext4. Some folks might remember my blue streak when I discovered that “new” ext4 would automagically transition your “old” ext4 disk to the new journalling formats and method; but rendered the disk unusable and un-fsck-able on older systems using the “old” ext4 file system. That is just “so wrong”. They ought to have called the “new” ext4 by the name ext5 and given you a choice not to be “converted”. Folks like me who move disks between systems get “converted and stuck”.
So back then I’d converted many of my disks to ext3 (still journalled, but not as fancy and it is stable). So in the last couple of days, while looking for where my USB Stick copies got stored, I converted about 6 TB of misc. disk partitions from EXT4 to EXT3 and that included making a copy of the data “somewhere else”, change the format, and copy back (or consolidate on another disk).
You start to appreciate the meaning of 4 GB disk on USB 2.0 after the first few hours… But as of now my only EXT4 disk remaining is the LVM disk on my nfs server and a 4 GB offline archive disk. The LVM group doesn’t get moved and the archive disk plugs into that system also; so none of them is a risk of an “ext4” incompatibility surprise.
But all this moving and formatting and copying and such had me thinking about File Systems. Since whoever was working on ext4 has shown themselves a bit of an idiot by causing this problem of “2 mutually incompatible ext4 formats”, was it time to move on to something else?
I’ve used LVM and I’m not that keen on it. Yes, it does what it says it will do, but at the expense of arcane commands and great sloth when you change disk configurations. Not interested in more of it (and “someday” will migrate the 8 GB of temperature data and site scrapes off of it, then turn it back into 2 x 4 GB of some other file system.
Other File Systems
Linux seems to sprout new File System choices faster than I can keep up. Many are specialized. Some are best suited to data centers with disk farms of 100+ disks and 24 x 7 staffing. Others are just not quite working yet…
First I eliminated the “special use” file systems. While I am going to use squasfs for some things, it is really just using a compressed file image to make a read-only file system. Similarly, f2fs is intended for lower wear on flash based things like SD cards and USB Sticks. Useful IF I were going back to a USB stick, but not for general disk. After you eliminate that kind of thing, and on Debian / Devuan; the remaining list of “general use” file systems is fairly narrow. I also left out “swap” as it is only for use as a swap partition.
EXT2, EXT3, EXT4 = The “native” Linux file systems. EXT2 is not journaled and is mostly used for boot partitions. EXT3 is a solid workhorse with somewhat smaller limits than EXT4. As I’ve mentioned, EXT4 has “versions” and I’m abandoning it.
FAT16, FAT32, HFAT, NTFS = These are all Microsoft file formats with “issues” that make them a pain for use with Linux / Unix. Mostly that they don’t handle all the normal cases and characters allowed in *Nix file names, but date stamping is also a bit daft.
HFS+ = The Macintosh file system. Works well, but read only on Linux at this time (or at lest last time I checked).
UFS = Used in Unix and BSD, but read only on Linux at this time.
Reiserfs, Reiser4 = An early advanced file system that supposedly works well, but the main developer (Mr. Reiser) has been arrested for the murder of his spouse. Aside from just not wanting to be reminded of a murderer every day, that will put a serious dent in future support and enhancement…
So at that point I arrived at the short list of “other file systems” of:
The Short list
Well, ZFS is pretty complicated. Does about everything, but not supported on many of the Linux varieties in my set of chips. Intended for major data center type work. It was originally developed for / by Sun Microsystems, and it works well, but has their usual complexity issues.
BTRFS (‘butter fs’) “has issues’
This page lists some problems one might face when trying btrfs, some of these are not really bugs, but rather inconveniences about things not yet implemented, or yet undocumented design decisions.
The page references issues relevant for a few stable kernel releases that seem to be in use. This currently contains 4.14, 4.9 and 4.4. The list will be updated once a new stable is announced.
Please note this just to document known issues, ask your kernel vendor for backporting fixes or support.
Please note that most of this page was not written by the btrfs developer community and may be entirely inaccurate.
Affecting all versions
Block-level copies of devices
make a block-level copy of a Btrfs filesystem to another block device…
use LVM snapshots, or any other kind of block level snapshots…
turn a copy of a filesystem that is stored in a file into a block device with the loopback driver…
… and then try to mount either the original or the snapshot while both are visible to the same kernel.
Files with a lot of random writes can become heavily fragmented (10000+ extents) causing thrashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM.
On servers and workstations this affects databases and virtual machine images.
The nodatacow mount option may be of use here, with associated gotchas.
On desktops this primarily affects application databases (including Firefox and Chromium profiles, GNOME Zeitgeist, Ubuntu Desktop Couch, Banshee, and Evolution’s datastore.)
Workarounds include manually defragmenting your home directory using btrfs fi defragment. Auto-defragment (mount option autodefrag) should solve this problem in 3.0.
Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files (may not work correctly with compression).
Currently raid5 and raid6 profiles have flaws that make it strongly not recommended as per the Status page.
In less recent releases the parity of resynchronized blocks was not calculated correctly, this has been fixed in recent releases (TBD).
If a crash happens while a raid5/raid6 volume is being written this can result in a “transid” mismatch as in transid verify failed.
The resulting corruption cannot be currently fixed.
For stable kernel versions 4.14.x, 4.9.x, 4.4.x
Having many subvolumes can be very slow
The cost of several operations, including currently balance, device delete and fs resize (shrinking), is proportional to the number of subvolumes, including snapshots, and (slightly super-linearly) the number of extents in the subvolumes.
Conversion from ext4 may not be undoable
In kernels 4.0+: the empty block groups are reclaimed automatically that can affect the following:
a converted filesystem may not be able to do a rollback because of the removed block groups
Then in some formus folks were telling stories of having unrecoverable data loss without explanation or other failure modes. This after several years of development and then several more in production / enhancement.
In short, it’s a complicated and still somewhat unstable beast with “issues”.
Oh, and Red Hat is dropping support for it from 7.x forward. Too much maintenance work and the guy who was doing it quit.
So not going with BTRFS…
Which leave us at XFS.
XFS came out of SGI in the ’90s. It is relatively mature, not too complicated, and has few bugs. The features are more than enough and it is designed for speed and efficiency. I don’t know how much of that will apply to a few slow USB disks, though…
So I’m going to continue to use EXT3 for everything I can; but “going forward” will be getting my feet wet in XFS. Some Linux vendors are using it as the default in their new releases and it does look to be gaining acceptance for wide use.
EXT3 is quite enough for just about everything I do, so I don’t feel a big loss by dropping EXT4 and using it.
XFS may be more complicated to set up than I like, or maybe the defaults will be fine. I’ll find out in a few weeks. IF I like it, I’ll make a trial build of a system using it and then live on it for a few weeks. Do note that some Linux releases can NOT boot from an XFS file system as their boot loader can’t read it. Red Hat is making theirs able to read a native XFS file system, but for now, keep your /boot as EXT2 or EXT3.
There are a whole lot more file systems you can put on Linux, but most of them don’t have much to offer, or are highly specialized (often for giant disk farms). There’s a list of file systems and details here (Not all on Linux):
As of now, I’ve moved all the TB that needed moving, everything on just about everything but the NFS server is EXT3 and compatible between any Linux versions I’ve got running. I’m going to make a test case XFS file system and compare speed and complexity on the Raspberry Pi. I’m in no hurry to change, though, so this is purely a “slow day play” operation.
Other than that, the rest of the “general purpose” file systems will no longer require my notice…
For a description of XFS and it’s features, this is a pretty readable writeup:
The file system consistency is guaranteed by the use of Journaling. The Journal size is calculated by the partition size used to make the XFS file system. If a crash occurs, the Journal can be used to ‘redo’ the transactions. When the file system is mounted and a previous crash is detected, the recovery of the items in the Journal is automatic.
For support of faster throughput, Allocation Groups are used. Allocation Groups provide for simultaneous I/O by multiple application threads at once. The ability allows systems with multiple processors or multi-core processors to provide better throughput with the file system. These benefits are better when the XFS file system spans multiple devices.
For multiple devices to be used within the same XFS file system, RAID 0 can be implemented. The more devices used in the striped file system, the higher the throughput can be achieved.
To also provide higher throughput, XFS uses Direct I/O to allow data retrieved from the XFS file system to go directly to the application memory space. Since the data is bypassing cache and the processor, the retrieval and writing of data is faster.
I’ll likely start with a 1 TB stand alone disk used for scratch space and “copy out / copy back” file moves and unpacking. Like, oh, unpacking the backups of my USB Stick and sorting through them ;-)