What to do if you have a stack of “modest” sized disks, say a couple of TB each, but you need a single directory of about 6 TB?
I suppose you could go out and buy a new 8 TB disk (some is lost in formatting and such). Or move some of the files to another disk (and put symbolic links in the original location – I’m running a wget, so if the files are just gone, they would be downloaded again). But the first one is expensive and requires moving a lot of data up front. The other has ongoing need to move data around and assuring that the wget is structured so that it really doesn’t try to download all that stuff again. All of it is a kludge.
There are alternatives.
The first one most folks think of is a RAID group. Redundant Array of Inexpensive Disks. This is most often used to make a group of disks where any one disk can fail, be replaced, and you lose no data. There are a bunch of RAID levels. Mirrors (2 sets of disks with one copy of the data each). Striped Groups (where each file has blocks on each disk, usually done to increase read and write speed as you can have a block buffered and R or W on each disk. And higher RAID types. Most often this is RAID 5 where blocks are spread over several disks, as is a block of parity data enough to reconstruct the data blocks on any one disk, were it to crash.
More on RAID levels:
In computer storage, the standard RAID levels comprise a basic set of RAID (redundant array of independent disks) configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 and its variants (mirroring), RAID 5 (distributed parity), and RAID 6 (dual parity).
Raid levels cover things like glueing together a set of disks, but often has a large time cost in building, and changing, the structure. When you add or remove a disk, the RAID does a “rebuild” and it can take a long time, especially on slow hardware like the Pi.
A striped group gives performance improvement as reads / writes are spread over several disk spindles and heads.
A mirror group gives data security, but at a high cost in duplicated disks and reads / writes.
RAID 3 and 4 are fairly specialized combinations of bit or byte striping and parity (on a dedicated disk for RAID 4).
RAID 5 has the parity distributed over all the disks, and RAID 6 has two copies of the parity so that you can lose 2 disks and survive.
All that parity has a large cost in computes, especially when the compute engine is small. Thus the very long rebuild times. Even adding a new empty disk involves a ‘rebuild’ as the data and parity get spread over that new disk and recomputed.
I built a RAID as my first cut at this problem, and then found that the ‘rebuild’ when I added a third disk was going to take a day. During that time, the RAID array is at risk. Every time I would add or remove a disk, that same process would happen. Furthermore, one disk is lost to parity, so for 3 disks, you get 2 disks of storage. Each disk improves efficiency, so more smaller disks is better than 2 giant disks. My USB Hub has only 4 slots, so at best I could get 3 disks worth of space usable. For 6 TB that would mean using 4 x 2 TB disks, and that would be “close” on total space. When it ran out, I’d be basically stuck. Adding another hub and more disks would start to get pricy and then there woud be the rebuild time.
Oh, and since for RAID 5, the basis is a striped group:
“RAID 5 consists of block-level striping with distributed parity.”
Each disk (or partition) must be of the same size. Well, some can be bigger than others, but the only space used will be the size of the smallest disk or partition. So if you have 4 x 1 TB disks, but one of them has a 100 GB partition set aside of something else, you will get 4 chunks of 900 MB each used, and only 3 x 900 MB available after parity. Spending 4 TB to get 2.7 TB starts to bite pretty quickly, especially when after formatting you are closer to 2.5 TB.
For anyone wanting to play with making a RAID, pretty good directions are here:
The very abbeviated form is:
If your Debian / Devuan has been a while since the last update, bring it up to date:
sudo apt-get update sudo apt-get upgrade sudo apt-get dist-upgrade
Personally, I’d skip the dist-ugrade, especially since it can screw up your Devuan on Pi in some cases (replacing the kernel on BerryBoot seems to kill it).
The program that impliments RAID on the Debian family is “mdadm” (multi-disk admin?) so install it.
apt-get install mdadm
Then you plug in your disks and create your RAID. Quoting the article:
mdadm -Cv /dev/md0 -l0 -n2 /dev/sd[ab]1 ( configure mdadm and create a raid array at /dev/md0 using raid0 with 2 disks ; sda1 and sdb1. To create a raid1, replace the line to read mdadm -Cv /dev/md0 -l1 -n2 /dev/sd[ab]1 )
Clearly for RAID 5 you would use -l5 instead. Also note that you can list the disks explicitly without the wildcard [ab] bit. So like:
mdadm -Cv /dev/md0 0l5 -n3 /dev/sda1 /dev/sdb1 /dev/sdc3
I’ve not tested that command, but think I have the syntax right and no typos… one hopes. IIRC, that’s what I did with my test case. Note that you can use different partitions on different disks and your particular disk partition names will vary. Note that you now have a RAID group on /dev/md0 but not a file system. So make one:
mkfs /dev/md0 -t ext4
You can now mount /dev/md0 like any other disk. I mounted it as /RAID for my testing.
For a fair time I searched for how to keep it straight what disk was in the RAID. They get marked with a magic number on the disk itself and assembled at boot time. Removing it can be a challenge…
Is there something less complicated, that takes less computes, and is more efficient with the disks?
LVM, an easier way
Logical Volume Manager.
The purpose of LVM is different from that of RAID. RAID is to handle data protection and performance, while LVM is for the purpose of making volume management easy.
Before anyone asks, yes, you can use the two together ( IFF you are prone to loving hyper-complex environments and enough levels of indirection to cause your eyes to glaze… but folks have used RAID to build the underlaying data vault then used LVM on top of it to make administering the disks easier).
With LVM you can “glue together” a gaggle of disks so that they look like one giant disk to the world. Or break up one gaggle of disks into a different gaggle of logical disks.
I just used it to create what looked like one giant disk by glueing together a 4 TB disk, a 2 TB disk, and a 1.5 TB disk. Notice that volume sizes can be anything and we’re not talking about data preservation or speed of access here. Just one BIG file system made out of several different disk bits.
The LVM Wiki is pretty good:
First you do the usual “upgrade / update” of the system. Then you install the LVM code and start the service:
sudo apt-get install lvm2 sudo service lvm2 start
The wiki has you install a graphical management bit, but I didn’t bother.
apt-get install system-config-lvm
Now there is a 3 level set of “stuff” to keep track of during the rest. Physical disks or disk partitions. Groups of “volumes” (called volume groups). And logical volumes created inside a volume group. There are commands to create, inspect, and manage things at each level. (So you can see how adding RAID above or below and adding a couple of more levels can be a bit confusing…)
OK, at the physical level we need to assign disks or disk partitions to the Volume Group. You can pretty much mix and match bits of disks at this level, though the pages encourage slugging in whole disks as simpler to manage. I built mine out of partitions and put a swap partition as slice ‘b’ on each disk. Why? Because I’m an old school surly curmudgeon who doesn’t like the idea of running swap onto a splotch of disk on a LVM volume in an LVM Group on a gaggle of physical disk partitions… but you can put swap on an LVM volume if you like, then just slug in whole disks for space. So instead of using /dev/sda1 for disk space and /dev/sda2 for swap (and paritioning accordingly) you can just add /dev/sda to the LVM group and parcel it out as desired to files or swap.
So once the LVM service is installed, how do you hand it disks or partitions?
As usual for all things systems admin, you either put a “sudo” in front of commands or run them as root. Just a reminder… So what is that command?
This marks that partition as part of the LVM batch. If you used “pvcreate /dev/sda” you would assign the whole disk.
There are a bunch of physical volume commands, but I’ve not found one that tells you how much real data is on any given physical disk.
PV commands list
pvchange — Change attributes of a Physical Volume.
pvck — Check Physical Volume metadata.
pvcreate — Initialize a disk or partition for use by LVM.
pvdisplay — Display attributes of a Physical Volume.
pvmove — Move Physical Extents.
pvremove — Remove a Physical Volume.
pvresize — Resize a disk or partition in use by LVM2.
pvs — Report information about Physical Volumes.
pvscan — Scan all disks for Physical Volumes.
You would think pvs would tell you how much of each physical volume had data on it. It doesn’t. It tells you how much has a file system built on it:
root@orangepione:~# df /LVM Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/TemperatureData-NoaaCdiacData 7207579544 2688218088 4189744724 40% /LVM root@orangepione:~# pvs PV VG Fmt Attr PSize PFree /dev/sda1 TemperatureData lvm2 a-- 3.64t 0 /dev/sdb1 TemperatureData lvm2 a-- 1.82t 0 /dev/sdc1 TemperatureData lvm2 a-- 1.36t 0
So with 60% empty, pvs shows nothing free. OK… It makes a certain kind of sense in that I can’t add a new Logical Volume as the space is committed to the /LVM mount point (made from the Volume Group “TemperatureData” and the Logical Volume “NoaaCdiacData” – and yes, I wish I’d used shorter names ;-)
As I understand it, unless you make it a striped group, then files are allotted in order from first disk to last disk, so I can assume that the 2.6 TB used is all on that first /dev/sda1 physical volume at this point… but I’d really like a command that let me know for sure…
OK, you have handed over some disk or partition to the physical volume list. Now how to do that Volume Group and Logical Volume stuff?
Create your Volume Group. I used TemperatureData as the name and wish I’d used TGroup…
vgcreate myVirtualGroup1 /dev/sda2
Then add another disk or partition to it with:
vgextend myVirtualGroup1 /dev/sda3
There are lots of things you can do with Volume Groups:
VG commands list
vgcfgbackup — Backup Volume Group descriptor area.
vgcfgrestore — Restore Volume Group descriptor area.
vgchange — Change attributes of a Volume Group.
vgck — Check Volume Group metadata.
vgconvert — Convert Volume Group metadata format.
vgcreate — Create a Volume Group.
vgdisplay — Display attributes of Volume Groups.
vgexport — Make volume Groups unknown to the system.
vgextend — Add Physical Volumes to a Volume Group.
vgimport — Make exported Volume Groups known to the system.
vgimportclone — Import and rename duplicated Volume Group (e.g. a hardware snapshot).
vgmerge — Merge two Volume Groups.
vgmknodes — Recreate Volume Group directory and Logical Volume special files
vgreduce — Reduce a Volume Group by removing one or more Physical Volumes.
vgremove — Remove a Volume Group.
vgrename — Rename a Volume Group.
vgs — Report information about Volume Groups.
vgscan — Scan all disks for Volume Groups and rebuild caches.
vgsplit — Split a Volume Group into two, moving any logical volumes from one Volume Group to another by moving entire Physical Volumes.
I’ve not explored most of those commands…
OK, you have a nice big volume group, now what? How to split out what looks like a disk to the system and mount it? Create a Logical Volume.
lvcreate -n myLogicalVolume1 -L 10g myVirtualGroup1
Now I used NoaaCdiacData for my logical volume name and wish I’d used NCData…
lvcreate -n NCData -L 100g Tgroup
Then format it to ext4 (or something else if you have good reason to).
mkfs -t ext4 /dev/Tgroup/NCData
You could now do a mount on /test to see if it worked:
mount /dev/Tgroup/NCdata /test
There are lots of Logical Volume commands too:
LV commands lvchange — Change attributes of a Logical Volume. lvconvert — Convert a Logical Volume from linear to mirror or snapshot. lvcreate — Create a Logical Volume in an existing Volume Group. lvdisplay — Display the attributes of a Logical Volume. lvextend — Extend the size of a Logical Volume. lvreduce — Reduce the size of a Logical Volume. lvremove — Remove a Logical Volume. lvrename — Rename a Logical Volume. lvresize — Resize a Logical Volume. lvs — Report information about Logical Volumes. lvscan — Scan (all disks) for Logical Volumes.
I had to add disks to my Volume Group after it was built, and then extend the size of the file system to include those other disks. The “lvextend” command does that, and then the resize2fs command to expand the file system to fill that extended space.
Essentially that’s it. If folks want examples of the lvextend an resize2fs commands, let me know and Illl add it, but it is fairly simple.
So that’s where I’m at on scraping the approximately 6 TB of “superghcnd” that looks like it is hourly data for a selection of GHCN sites. About 10 GB / day… I chose to use LVM just to avoid several days worth of “rebuild” on RAID volumes, and because I could glue together a gaggle of different disks into one logical volume image. I risk that any disk loss can cause all of it to be lost, but since it is a duplicate of an online server, I’m able to reload it if needed (as long as NOAA keeps it up).
Sometime after I have a full copy, should I desire more security, I could make a RAID volume (and maybe put LVM on top of it), then gradually grow the RAID as I copied over data and shrink the LVM group… Or just toss a couple of $Hundred more at a couple of added 4 TB disks. It’s a full 3 weeks until the download is finished, and I’ve got plenty of raw space at the moment, so lots of time to think about the next step. At the moment, I’m happy to just have it all download and then leave the disks turned off 90% of the time. Turned off disks in a drawer have a long MTBF (Mean Time Between Failure).
At present I have about 2 TB additional empty disk, beyond the 4 TB free in the LVM Logical Volume at the moment, so things are fine for now. I think I’m going to need about 2.5 of that 4 TB to have the download finish. Then I’ll decide on “safe in a drawer” or “Move to a RAID”. I already have a simple copy of the data that does not include the “superGHCNd” mammouth chunk, so the only bit at risk is that huge chunk of unclear value. I think just moving one day of that data and the ‘diff’ files to a duplicate is enough “protection”.
So there you have it. The “joys” of slugging TB of data around and how to do it.