RAID, LVM, Gaggle Of Disks…

What to do if you have a stack of “modest” sized disks, say a couple of TB each, but you need a single directory of about 6 TB?

I suppose you could go out and buy a new 8 TB disk (some is lost in formatting and such). Or move some of the files to another disk (and put symbolic links in the original location – I’m running a wget, so if the files are just gone, they would be downloaded again). But the first one is expensive and requires moving a lot of data up front. The other has ongoing need to move data around and assuring that the wget is structured so that it really doesn’t try to download all that stuff again. All of it is a kludge.

There are alternatives.

RAID

The first one most folks think of is a RAID group. Redundant Array of Inexpensive Disks. This is most often used to make a group of disks where any one disk can fail, be replaced, and you lose no data. There are a bunch of RAID levels. Mirrors (2 sets of disks with one copy of the data each). Striped Groups (where each file has blocks on each disk, usually done to increase read and write speed as you can have a block buffered and R or W on each disk. And higher RAID types. Most often this is RAID 5 where blocks are spread over several disks, as is a block of parity data enough to reconstruct the data blocks on any one disk, were it to crash.

More on RAID levels:

https://en.wikipedia.org/wiki/Standard_RAID_levels/a>

In computer storage, the standard RAID levels comprise a basic set of RAID (redundant array of independent disks) configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 and its variants (mirroring), RAID 5 (distributed parity), and RAID 6 (dual parity).

Raid levels cover things like glueing together a set of disks, but often has a large time cost in building, and changing, the structure. When you add or remove a disk, the RAID does a “rebuild” and it can take a long time, especially on slow hardware like the Pi.

A striped group gives performance improvement as reads / writes are spread over several disk spindles and heads.

A mirror group gives data security, but at a high cost in duplicated disks and reads / writes.

RAID 3 and 4 are fairly specialized combinations of bit or byte striping and parity (on a dedicated disk for RAID 4).

RAID 5 has the parity distributed over all the disks, and RAID 6 has two copies of the parity so that you can lose 2 disks and survive.

All that parity has a large cost in computes, especially when the compute engine is small. Thus the very long rebuild times. Even adding a new empty disk involves a ‘rebuild’ as the data and parity get spread over that new disk and recomputed.

I built a RAID as my first cut at this problem, and then found that the ‘rebuild’ when I added a third disk was going to take a day. During that time, the RAID array is at risk. Every time I would add or remove a disk, that same process would happen. Furthermore, one disk is lost to parity, so for 3 disks, you get 2 disks of storage. Each disk improves efficiency, so more smaller disks is better than 2 giant disks. My USB Hub has only 4 slots, so at best I could get 3 disks worth of space usable. For 6 TB that would mean using 4 x 2 TB disks, and that would be “close” on total space. When it ran out, I’d be basically stuck. Adding another hub and more disks would start to get pricy and then there woud be the rebuild time.

Oh, and since for RAID 5, the basis is a striped group:

“RAID 5 consists of block-level striping with distributed parity.”

Each disk (or partition) must be of the same size. Well, some can be bigger than others, but the only space used will be the size of the smallest disk or partition. So if you have 4 x 1 TB disks, but one of them has a 100 GB partition set aside of something else, you will get 4 chunks of 900 MB each used, and only 3 x 900 MB available after parity. Spending 4 TB to get 2.7 TB starts to bite pretty quickly, especially when after formatting you are closer to 2.5 TB.

For anyone wanting to play with making a RAID, pretty good directions are here:

http://projpi.com/diy-home-projects-with-a-raspberry-pi/raspberry-pi-raid-array-with-usb-hdds/

The very abbeviated form is:

If your Debian / Devuan has been a while since the last update, bring it up to date:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

Personally, I’d skip the dist-ugrade, especially since it can screw up your Devuan on Pi in some cases (replacing the kernel on BerryBoot seems to kill it).

The program that impliments RAID on the Debian family is “mdadm” (multi-disk admin?) so install it.

apt-get install mdadm

Then you plug in your disks and create your RAID. Quoting the article:

mdadm -Cv /dev/md0 -l0 -n2 /dev/sd[ab]1 ( configure mdadm and create a raid array at /dev/md0 using raid0 with 2 disks ; sda1 and sdb1. To create a raid1, replace the line to read mdadm -Cv /dev/md0 -l1 -n2 /dev/sd[ab]1 )

Clearly for RAID 5 you would use -l5 instead. Also note that you can list the disks explicitly without the wildcard [ab] bit. So like:

mdadm -Cv /dev/md0 0l5 -n3 /dev/sda1 /dev/sdb1 /dev/sdc3

I’ve not tested that command, but think I have the syntax right and no typos… one hopes. IIRC, that’s what I did with my test case. Note that you can use different partitions on different disks and your particular disk partition names will vary. Note that you now have a RAID group on /dev/md0 but not a file system. So make one:

mkfs /dev/md0 -t ext4 

You can now mount /dev/md0 like any other disk. I mounted it as /RAID for my testing.

For a fair time I searched for how to keep it straight what disk was in the RAID. They get marked with a magic number on the disk itself and assembled at boot time. Removing it can be a challenge

Is there something less complicated, that takes less computes, and is more efficient with the disks?

LVM, an easier way

Logical Volume Manager.

The purpose of LVM is different from that of RAID. RAID is to handle data protection and performance, while LVM is for the purpose of making volume management easy.

Before anyone asks, yes, you can use the two together ( IFF you are prone to loving hyper-complex environments and enough levels of indirection to cause your eyes to glaze… but folks have used RAID to build the underlaying data vault then used LVM on top of it to make administering the disks easier).

With LVM you can “glue together” a gaggle of disks so that they look like one giant disk to the world. Or break up one gaggle of disks into a different gaggle of logical disks.

I just used it to create what looked like one giant disk by glueing together a 4 TB disk, a 2 TB disk, and a 1.5 TB disk. Notice that volume sizes can be anything and we’re not talking about data preservation or speed of access here. Just one BIG file system made out of several different disk bits.

The LVM Wiki is pretty good:

https://wiki.debian.org/LVM

First you do the usual “upgrade / update” of the system. Then you install the LVM code and start the service:

sudo apt-get install lvm2
sudo service lvm2 start

The wiki has you install a graphical management bit, but I didn’t bother.

apt-get install system-config-lvm

Now there is a 3 level set of “stuff” to keep track of during the rest. Physical disks or disk partitions. Groups of “volumes” (called volume groups). And logical volumes created inside a volume group. There are commands to create, inspect, and manage things at each level. (So you can see how adding RAID above or below and adding a couple of more levels can be a bit confusing…)

OK, at the physical level we need to assign disks or disk partitions to the Volume Group. You can pretty much mix and match bits of disks at this level, though the pages encourage slugging in whole disks as simpler to manage. I built mine out of partitions and put a swap partition as slice ‘b’ on each disk. Why? Because I’m an old school surly curmudgeon who doesn’t like the idea of running swap onto a splotch of disk on a LVM volume in an LVM Group on a gaggle of physical disk partitions… but you can put swap on an LVM volume if you like, then just slug in whole disks for space. So instead of using /dev/sda1 for disk space and /dev/sda2 for swap (and paritioning accordingly) you can just add /dev/sda to the LVM group and parcel it out as desired to files or swap.

So once the LVM service is installed, how do you hand it disks or partitions?

As usual for all things systems admin, you either put a “sudo” in front of commands or run them as root. Just a reminder… So what is that command?

pvcreate /dev/sda2

This marks that partition as part of the LVM batch. If you used “pvcreate /dev/sda” you would assign the whole disk.

There are a bunch of physical volume commands, but I’ve not found one that tells you how much real data is on any given physical disk.

PV commands list

pvchange — Change attributes of a Physical Volume.
pvck — Check Physical Volume metadata.
pvcreate — Initialize a disk or partition for use by LVM.
pvdisplay — Display attributes of a Physical Volume.
pvmove — Move Physical Extents.
pvremove — Remove a Physical Volume.
pvresize — Resize a disk or partition in use by LVM2.
pvs — Report information about Physical Volumes.
pvscan — Scan all disks for Physical Volumes.

You would think pvs would tell you how much of each physical volume had data on it. It doesn’t. It tells you how much has a file system built on it:

root@orangepione:~# df /LVM
Filesystem                                 1K-blocks       Used  Available Use% Mounted on
/dev/mapper/TemperatureData-NoaaCdiacData 7207579544 2688218088 4189744724  40% /LVM

root@orangepione:~# pvs
  PV         VG              Fmt  Attr PSize PFree
  /dev/sda1  TemperatureData lvm2 a--  3.64t    0 
  /dev/sdb1  TemperatureData lvm2 a--  1.82t    0 
  /dev/sdc1  TemperatureData lvm2 a--  1.36t    0 

So with 60% empty, pvs shows nothing free. OK… It makes a certain kind of sense in that I can’t add a new Logical Volume as the space is committed to the /LVM mount point (made from the Volume Group “TemperatureData” and the Logical Volume “NoaaCdiacData” – and yes, I wish I’d used shorter names ;-)

As I understand it, unless you make it a striped group, then files are allotted in order from first disk to last disk, so I can assume that the 2.6 TB used is all on that first /dev/sda1 physical volume at this point… but I’d really like a command that let me know for sure…

OK, you have handed over some disk or partition to the physical volume list. Now how to do that Volume Group and Logical Volume stuff?

Create your Volume Group. I used TemperatureData as the name and wish I’d used TGroup…

vgcreate myVirtualGroup1 /dev/sda2

Then add another disk or partition to it with:

vgextend myVirtualGroup1 /dev/sda3

There are lots of things you can do with Volume Groups:

VG commands list

vgcfgbackup — Backup Volume Group descriptor area.
vgcfgrestore — Restore Volume Group descriptor area.
vgchange — Change attributes of a Volume Group.
vgck — Check Volume Group metadata.
vgconvert — Convert Volume Group metadata format.
vgcreate — Create a Volume Group.
vgdisplay — Display attributes of Volume Groups.
vgexport — Make volume Groups unknown to the system.
vgextend — Add Physical Volumes to a Volume Group.
vgimport — Make exported Volume Groups known to the system.
vgimportclone — Import and rename duplicated Volume Group (e.g. a hardware snapshot).
vgmerge — Merge two Volume Groups.
vgmknodes — Recreate Volume Group directory and Logical Volume special files
vgreduce — Reduce a Volume Group by removing one or more Physical Volumes.
vgremove — Remove a Volume Group.
vgrename — Rename a Volume Group.
vgs — Report information about Volume Groups.
vgscan — Scan all disks for Volume Groups and rebuild caches.
vgsplit — Split a Volume Group into two, moving any logical volumes from one Volume Group to another by moving entire Physical Volumes.

I’ve not explored most of those commands…

OK, you have a nice big volume group, now what? How to split out what looks like a disk to the system and mount it? Create a Logical Volume.

lvcreate -n myLogicalVolume1 -L 10g myVirtualGroup1

Now I used NoaaCdiacData for my logical volume name and wish I’d used NCData…

lvcreate -n NCData -L 100g Tgroup

Then format it to ext4 (or something else if you have good reason to).

mkfs -t ext4 /dev/Tgroup/NCData

You could now do a mount on /test to see if it worked:

mount /dev/Tgroup/NCdata /test

There are lots of Logical Volume commands too:

LV commands

lvchange — Change attributes of a Logical Volume.
lvconvert — Convert a Logical Volume from linear to mirror or snapshot.
lvcreate — Create a Logical Volume in an existing Volume Group.
lvdisplay — Display the attributes of a Logical Volume.
lvextend — Extend the size of a Logical Volume.
lvreduce — Reduce the size of a Logical Volume.
lvremove — Remove a Logical Volume.
lvrename — Rename a Logical Volume.
lvresize — Resize a Logical Volume.
lvs — Report information about Logical Volumes.
lvscan — Scan (all disks) for Logical Volumes.

I had to add disks to my Volume Group after it was built, and then extend the size of the file system to include those other disks. The “lvextend” command does that, and then the resize2fs command to expand the file system to fill that extended space.

Essentially that’s it. If folks want examples of the lvextend an resize2fs commands, let me know and Illl add it, but it is fairly simple.

In Conclusion

So that’s where I’m at on scraping the approximately 6 TB of “superghcnd” that looks like it is hourly data for a selection of GHCN sites. About 10 GB / day… I chose to use LVM just to avoid several days worth of “rebuild” on RAID volumes, and because I could glue together a gaggle of different disks into one logical volume image. I risk that any disk loss can cause all of it to be lost, but since it is a duplicate of an online server, I’m able to reload it if needed (as long as NOAA keeps it up).

Sometime after I have a full copy, should I desire more security, I could make a RAID volume (and maybe put LVM on top of it), then gradually grow the RAID as I copied over data and shrink the LVM group… Or just toss a couple of $Hundred more at a couple of added 4 TB disks. It’s a full 3 weeks until the download is finished, and I’ve got plenty of raw space at the moment, so lots of time to think about the next step. At the moment, I’m happy to just have it all download and then leave the disks turned off 90% of the time. Turned off disks in a drawer have a long MTBF (Mean Time Between Failure).

At present I have about 2 TB additional empty disk, beyond the 4 TB free in the LVM Logical Volume at the moment, so things are fine for now. I think I’m going to need about 2.5 of that 4 TB to have the download finish. Then I’ll decide on “safe in a drawer” or “Move to a RAID”. I already have a simple copy of the data that does not include the “superGHCNd” mammouth chunk, so the only bit at risk is that huge chunk of unclear value. I think just moving one day of that data and the ‘diff’ files to a duplicate is enough “protection”.

So there you have it. The “joys” of slugging TB of data around and how to do it.

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , . Bookmark the permalink.

10 Responses to RAID, LVM, Gaggle Of Disks…

  1. catweazle666 says:

    ” Or just toss a couple of $Hundred more at a couple of added 4 TB disks.”

    Scary…

    Somewhere amongst my huge collection of hi-tech garbage I have an eight inch Fujitsu 20MB HDD.
    When it was new in 1978, it cost around $10,000…

  2. E.M.Smith says:

    @CatWeazel666:

    Hey, I bought disks for a Cray… They were something like a $Million… but about 1/10 th the needed storage, so we bought a set of 2 tape robots about the size of a small office each. IIRC, something like $250,000 each for a total of $1/2 Million. Total storage? It was something like 2 TB … and access was slow as the tape had to load…

  3. Chuckles says:

    Use ZFS to controle and manage them? It’s apparently running on the pi these days.

  4. E.M.Smith says:

    Well, I can get to GISS through a UK VPN site, but drop the VPN and the same page won’t load via local network.

    Well, this opens all kinds of fun. Visit local libraries and college campuses in places known to be warmer biased, and add them to the bloked list with one 15 second command. Do the same at local governments (in silicon valley some cities provide free wifi… and are usually very lefty…) The opportunities are huge. Heck, if NASA has public visitor WiFi in their lobby, it could be done there too!

    Meanwhile, I just take the old laptop and a USB disk to the local Srarbucks and scrape away while working on my web pages…

    BTW, the download of my site works nicely from local disk. Comments look to be included too. Some links look like they might have downloaded (like some images in pages) but video links to youtube stay just links. Some more tuning for just what links get copied may be in order (or closer inspection may show it fine). Looks like about 1.5 GB of stuff. Might try some other sites later. I can think of a few to archive and some to read offline…

  5. E.M.Smith says:

    @Chuckles:

    I looked a zfs. It is interesting but had some issues I didn’t like. Don’t remember exactly… IIRC, it doesn’t work on all the OSs I run. So a chip swap loses your disk access. Then there was another issue… but I’ve lost the plot on that one (as I’d decided to not go there…so why remember the details). I think it was that the licence is incompatible with Linux, so it is implemented in user space (FUSE). A compute loaded parity creating file system in user space on a Pi seemed likely to have issues. Then add that disk auto detect and even reading would fail on any system chip not rebuilt to use it… so easy to forget that “detected as empty” disk really had valuable data on it…

    Just looked like a bit more rope pushing than I wanted.

    But in a Solaris shop, sure.

  6. Chuckles says:

    Yup, that makes two of us E.M., I also backed away from it initially even though it’s a brilliant piece of design for pretty much the same reasons as you
    A native ZFS for Linux saw the light of day in 2013 getting away from the FUSE approach, and while there are still a couple of ??? on the licensing, everyone and their dog are now including it or supporting it.

    https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=19062

  7. catweazle666 says:

    A Cray eh?

    Nah, they’re for sissies.

    I’ve still got one of these!

    With 256 whole bytes of RAM!

    After several hours of messing around entering hex on the undebounced keyboard, it was possible to get it to scroll your name around the seven segment LED display.

  8. I have a Mk14 too ;) expanded to a full 512 bytes of memory. One started by entering a short program in hex allowing writing to tape or reading from tape. This was the first assembly language for me … Writing programs in assembly and manually converting to hex code.
    I was a poor student so I could only dream of a real Imsai or similar. Nice toy!

  9. E.M.Smith says:

    Don’t make me get out my Radio Shack Dual Processor 4 bit computer that ran BASIC… It had one processor just to do the I/O and the other one do any actual operations. A single line of display of about 40? characters, and a built in keyboard.

    My first personal dual core machine… even if they were dog slow 4 bit cores… and it could only do one line of text at a time…

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s