64 bit vs Raspberry Pi Model 2, a surprising thing

As those who have been “following along” know, I recently did a compression of a very large file system using the Raspberry Pi Model 2 with 4 ARM cores. It took a couple of days…

So I’m now looking at both repeating that (on a slightly larger version of CDIAC data now that it is completed) AND doing it on the NOAA data that is significantly larger ( 228 GB and rising vs 135 GB).

So I thought to myself: Self, you are ‘compressing in place’ so taking long seeks on the disk drive (yes, I know (said I to self) it’s CPU limited… but… maybe…) AND you are using 4 tiny little 32 bit patches of dinky cores that aren’t even Intel(tm) Inside(c)…

HOW, oh HOW can you think this is The Very Best Way?

The Experiment

So I had a Very Bright Idea ™!!!

I’ll use the RPiM2 just as a file server. It will read blocks off of disk, and shove them over the network via NFS (Network File Server – the Linux / Unix equal of Samba… though some of us think it much superior…) to my 64 Bit x86_64 box and have it do the compression, then write the result to local disk at SATA speed. Surely having the heads floating over just one patch of disk on both read and write disks, and having the compression done on a 64 bit CPU (that is not a RISC – Reduced Instruction Set Computer; but a CISC – Complex Instruction Set Computer) with LOADS of memory and fancy pipelined instructions and all sorts of Koool Stuff! would surly be faster!?!

As I’m about to do a 2 day “mksquashfs” compression of one data set and a nearly 3 day compression of another, it was worth a test to figure out which is faster…

So I launched a compression on the Raspberry Pi Model 2 of a local data set. As before, it was CPU limited. No Disk I/O Need Apply, as the CPU was “at the wall” the whole time and there was NO “wait state” showing in the “top” command display. Eventually it finished.

Then I did the same thing on my Antek / ASUS box (that cost almost exactly the same… but was used from the local castoff computer store – Weird Stuff [an iconic Silicon Valley place to buy everything from old PCs to an original Apple 1 { I saw one there in the back room! } to PCs one generation back from the newest to just about anything else…].

The results were a bit of a surprise.

Full Disclosure

Over the fence from my home is a very nice neighbor. We’ve had a ‘relationship’ for about 2 decades. My bunnies tended to ‘tunnel’ (as they were free range) and sometimes came up in their yard. “We’ve met”…

They were prone to being vegetarians. Several in my family are vegetarians and I often cook vegetarian meals. (My daughter and her beau in particular, along with my spousal twin and her daughter who are very vegan, and…)

I had bought a “Vegetarian BBQ Cookbook” and, well, I’m sorry, but Pork Ribs and Chicken have nothing to fear… so I gave it to them about a decade back. It’s a very nice “coffee table” sized book that was not particularly cheap, but, well, worth more in their hands than in mine.

We watched our kids grow up together.

So I’m at Weird Stuff buying an old used PC (that Antek / ASUS 64 bit box) and I’m thinking I have a question about Linux and the box… so ask for The Manager, please…

Turns out it’s my neighbor.

So I “got a deal” on the box.

Turns out his {older brother? some relative… damn I wish I paid more attention to times when folks spend 10 minutes on things that don’t matter much to me ;-) but some relative} ran the place then {what was that? a… something bad and the brother…} happened and now he’s the manager. Surprise!

So I’ve been shopping at Weird Stuff for way longer than he’s been the Manager In Charge, but we ARE neighbors… even if that didn’t change much more than about $10 of discount (my guess) on a way old PC with a single AMD processor in it…

But I thought you ought to know…

So yes, you ought to visit Weird Stuff, but not because my neighbor is an OK guy when my bunny tunnels into his tomatoes… but because they have interesting stuff at low prices.

Here’s their official page:


Oh, and in complete full disclosure, I’m making this posting on that very box I bought from them since that is where the last test case ran.

It is an ANTEK box using an ASUS motherboard with an AMD Sempron 3200 @1.8 GHZ CPU ( 64 bit) in it and with a SATA disk drive. It came with Windows XP installed and that was one of my major reasons for buying it as I’d not yet recovered the W-XP on the EVO so this was my back up backup recovery plan. (Little did I know that putting Linux on it was going to be a Royal PITA as the motherboard needed special drivers for the video… but I’ve covered that elsewhere…)

So that’s the end of the ‘back story’….

And it has no bearing on anything other than my sense of guilt if I didn’t disclose that I actually know someone else who works and lives in silicon valley…

Back At The Test

So the part that really matters. The Test.

“IF you don’t test it, it ain’t for shit. -E.M.Smith”

I was QA manager at a compiler company for a while… you test things or you ship shit. So I test things.

First I ran the “base case” on the Raspberry Pi Model 2. Then I did the same thing on the Antek / ASUS box. Files were fed via NFS (Network File System) and watching the process, it ran about 50 mb/second on a 100 mb/second network. At all times the process was CPU limited with near zero “wait state” and a lot of User time with some Systems time, but no idle time.

This is the end report:

[root@CentosBox TempsArc]# ls
1_DU_mb_out	CDIAC_wget_log	   GHCN.sqsh   NOAA_NCDC		  Temps
CDIAC		ftp.ncdc.noaa.gov  lost+found  Temperature_Data
cdiac.ornl.gov	GHCN		   mountsq     Temperature_Data_from_Mac
[root@CentosBox TempsArc]# time sqitlocal GHCN /tmp/GHCN 
Parallel mksquashfs: Using 1 processor
Creating 4.0 filesystem on /tmp/GHCN.sqsh, block size 65536.
[=========================================================/] 111267/111267 100%
Exportable Squashfs 4.0 filesystem, data block size 65536
	compressed data, compressed metadata, compressed fragments
	duplicates are removed
Filesystem size 3625581.86 Kbytes (3540.61 Mbytes)
	51.13% of uncompressed filesystem size (7090962.03 Kbytes)
Inode table size 212891 bytes (207.90 Kbytes)
	45.81% of uncompressed inode table size (464698 bytes)
Directory table size 6391 bytes (6.24 Kbytes)
	43.90% of uncompressed directory table size (14558 bytes)
Number of duplicate files found 108
Number of inodes 666
Number of files 627
Number of fragments 67
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 39
Number of ids (unique uids + gids) 3
Number of uids 2
	chiefio (500)
	ems (1000)
Number of gids 3
	chiefio (500)
	ems (1000)
	root (0)

real	60m39.016s
user	44m28.409s
sys	3m4.100s

The command executed was this:

[root@CentosBox TempsArc]# cat /usr/bin/sqitlocal 
mksquashfs ${1-/tmp} ${2-/tmp/$1}.sqsh -b 65536

Basically the same as the ‘sqit’ command but letting me put the output somewhere more interesting, like on the local disk as /tmp/GHCN.sqsh.

[root@CentosBox TempsArc]# ls -l /tmp/GHCN.sqsh 
-rwx------. 1 root root 3712598016 Oct 14 17:44 /tmp/GHCN.sqsh
[root@CentosBox TempsArc]# du -ms GHCN
6926	GHCN

So a 6.9 GB file got reduced to 3.7 GB using about an hour of ‘wall time’ in that “real 60′ and about 44 minutes of User CPU time along with a nearly irrelevant 3 minutes of ‘system’ CPU time.

Now that’s all well and good, but how about the RPiM2 with long seeks on the local disk (that didn’t matter as it was ‘balls to the wall’ CPU pegged at 100% the whole time) and using everything it had?

What does it do locally? With those tiny little 32 bit RISC cores?

Here’s the stats:

root@RaPiM2:/TempsArc# time sqit GHCN 
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on GHCN.sqsh, block size 65536.
[======================================================================================\] 111267/111267 100%
Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 65536
	compressed data, compressed metadata, compressed fragments, compressed xattrs
	duplicates are removed
Filesystem size 3625581.88 Kbytes (3540.61 Mbytes)
	51.13% of uncompressed filesystem size (7090968.31 Kbytes)
Inode table size 212893 bytes (207.90 Kbytes)
	45.81% of uncompressed inode table size (464698 bytes)
Directory table size 6391 bytes (6.24 Kbytes)
	43.90% of uncompressed directory table size (14558 bytes)
Number of duplicate files found 108
Number of inodes 666
Number of files 627
Number of fragments 67
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 39
Number of ids (unique uids + gids) 3
Number of uids 2
	chiefio (500)
	pi (1000)
Number of gids 3
	chiefio (500)
	pi (1000)
	root (0)

real	45m49.139s
user	173m37.960s
sys	4m7.400s

Yes, only 45 minutes ‘wall time’. The user time is 173 minutes but you must divide by 4 as there are 4 processors.

So on an actual wall time basis, the RPiM2 is 4/3 the speed of that AMD 64 bit CPU. Golly.

Now you didn’t notice, but I swapped over to the RPiM2 to paste in the stats from the terminal window there. I can state without reservation that the Antek / ASUS / AMD 64 bit CPU machine has a much more responsive and “liquid” feel when editing WordPress pages. Likely due to having a whole CPU instead of 1/4 of 4 available. But once this code is made to “run parallel”, then the RPiM2 will beat it here, too.

Not at all what I expected.

In Conclusion

For the next couple of days the Raspberry Pi Model 2 will be busy doing “mksquashfs” file system builds on various chunks of the file system name space that are never going to be written again. I’m making fast, efficient “squashfs” file systems out of those chunks and storing them off on my archive space.

BTW, it isn’t a free ride to compress… if some bits get corrupted you can lose the whole 200 GB instead of just the one file with the corrupted bits… but as ‘make a RAID’ file system on the pi is next up, and much of this is duplicated at NOAA and CDIAC and can be downloaded again; I’m OK with that small risk for a short time.

It does kind of bother me a little bit in that the R.Pi is silent, and the Antek makes a lot of fan noise, so I really like the ‘Silence Of The Pi’ better… but for a while at least it will be busy squashing and I’ll be browsing on The Whirring Monster… (and loving every fast minute of it ;-)


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , , . Bookmark the permalink.

12 Responses to 64 bit vs Raspberry Pi Model 2, a surprising thing

  1. LG says:

    Am I reading this correctly :

    “So a 6.9 MB file got reduced to 3.7 GB using about an hour of ‘wall time’ ”

  2. Nick Fiekowsky says:

    If you can spare the time, would be interesting to have the RPiM2 read via NFS from a separate system. Then you’re comparing apples to apples. NFS has CPU overhead, network latency is one or two orders of magnitude higher than local disk.

    Early this year we had performance problems with a few database servers that migrated from physical, with several internal spindles, to virtual. Virtual server took the database server’s native disk I/O and turned it into NFS sent over GigE connection to disk array. The NFS performance penalty suddenly made the database suffer from inefficient queries that had been harmless in the physical environment.

    Right now looks like you’re comparing to apples to apple pie.

    Discussion of relative NFS and iSCSI performance penalties.

  3. E.M.Smith says:


    You are readinjg it correctly but that doesn’t mean I typed it correctly…

    I have a ‘bad habit’. I type and post first, then do my final Edit Read.

    It was MB for about 2 minutes and then I corrected it to GB.

    Unfortunately, that was enough time for you to read and post a comment…

  4. E.M.Smith says:


    I suppose that’s worth the time to do, but…

    The reality is that I have ONE copy of the data on the disk where it lives, so I would need to add the NFS copy time to the numbers above for what I really want to do… but as an intellectual exercise, for some unknown future problem, yeah, worth it I suppose.

    Also, since in both instances the process was CPU limited AND the network was not limited, AND the CPU load of NFS seems to land more on the server (R.Pi) than the client, I’m not expecting a whole lot of difference.

    But I could be wrong…

    So (likely sometime about 2 AM when I’ve found the Scotch bottle and it sounds more like a good idea 8-) I’ll see about the other permutations of the exercise…

  5. E.M.Smith says:

    Well, I “did the experiment”.

    One thing to realize is that this particular machine (the Antek / ASUS box) does horridly with the USB ports. I’ve not put the time into it to figure out why ( Linux driver or hardware or?…) as it is just ‘fast enough’ to be acceptable for small file moves and that’s all I use them for on this box. My major file server is now the Raspberry Pi (that even though USB 2.0 runs faster…)

    So on some machine with a real, fast, USB port and especially if I had one that was 3.0 then I’d just plug the USB external drive into it and avoid moving the data from disk to disk.

    Also note that this box has a 160 GB SATA disk in it. Nice and fast, but not going to hold the 220+ GB of temp data that I’m compressing… so it either comes via NFS or via USB to get to this processor and have the 64 bit AMD Semperon work on it.

    All of which just means that the below test is entirely useless for the actual thing I need to do.

    But it does answer the speed question.

    The All SATA All 64 Bit Test

    First off, I had to copy the data over to the SATA drive. I did this via NFS (just as in the remote mounted file compression) but this let me measure what resources it took just for the move. I started with making a directory to use for the testing that was in the part of the disk with the most free space. This system is running volume groups so the one large partition is divided into a vg for ‘home’ and one for everything else. The first 1/2 of the disk is formatted NTFS and used by Windows -XP should I ever need it, so all head seeks and I/Os are, at most, 1/2 of a ‘long seek’ and most likely isolated into that last 1/7 th of the disk diameter that is ‘free’ as all the other stuff was pretty much written once in order and not fragmented after that. That is, the disk speed ought to be close to optimal even with the read / write seeks during the compression phase.

    [root@CentosBox ext]# cd /
    [root@CentosBox /]# mkdir Testing
    [root@CentosBox /]# cd Testing
    [root@CentosBox Testing]# ls
    [root@CentosBox Testing]# df .
    Filesystem           1K-blocks      Used Available Use% Mounted on
                          51609340  27856652  23228456  55% /

    Then I do the actual copy from the NFS mounted partition to the real disk. In this first phase there ought to be no head seeks as the data is just written out in a long stream.

    [root@CentosBox Testing]# time cp -a /TempsArc/GHCN GHCN
    real	10m53.127s
    user	0m1.064s
    sys	1m28.049s
    [root@CentosBox Testing]# 

    So the entire CPU hit for the copy was about 1 1/2 minutes. Given the prior run was 47 1/2 minutes, that’s about 3% overhead for the entire NFS traffic / move / write process (and the prior effort would not have had those disk writes accounted to the NFS portion so the actual NFS load will be even less, this is a ‘worst case’ that included the write to disk CPU / overhead).

    Given that, the ‘cost’ of the NFS is likely minimal. Though note that it took 10 minutes of ‘wall time’ to do the copy. For an actual ‘move and compress’ that wall time would be real time needed to copy the data over and ought to count against final completion elapse total.

    But what happens to the times in the local compress? This will be ‘read from local disk, compress, write to local disk’ and the SATA disk ought to be faster. (In both cases the CPU was effectively pegged at about 89% or so for the mksquashfs though there will have been minor variations based on what else happened – like screen refreshes and such).

    Here’s a “top” text grab taken while I was moving the mouse around and causing Xorg to do more work to get the image…

    [chiefio@CentosBox /]$ top
    top - 10:58:45 up  1:02,  5 users,  load average: 1.30, 1.35, 1.19
    Tasks: 195 total,   1 running, 194 sleeping,   0 stopped,   0 zombie
    Cpu(s): 92.8%us,  4.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  2.3%hi,  0.3%si,  0.0%st
    Mem:    954676k total,   889824k used,    64852k free,    13040k buffers
    Swap:  3008496k total,        8k used,  3008488k free,   397712k cached
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
     3761 root      20   0  389m 100m  700 S 77.1 10.8  14:02.39 mksquashfs         
     2477 root      20   0  237m  41m 6736 S 11.9  4.4   8:03.95 Xorg               
     3027 chiefio   20   0  295m  13m 9016 S  5.3  1.4   0:12.99 gnome-terminal     
     3025 chiefio   20   0  323m  13m 9128 S  4.0  1.4   4:17.04 gnome-system-mo    
     2785 chiefio   20   0  327m  11m 8584 S  0.7  1.2   0:09.62 wnck-applet        
     3043 chiefio   20   0 15036 1224  856 R  0.3  0.1   0:23.77 top                
        1 root      20   0 19364 1360 1052 S  0.0  0.1   0:01.69 init               
        2 root      20   0     0    0    0 S  0.0  0.0   0:00.03 kthreadd           
        3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0        
        4 root      20   0     0    0    0 S  0.0  0.0   0:00.97 ksoftirqd/0   

    So even with Xorg sucking up nearly 12% the mksquashfs was still getting 77%. When not moving the mouse to mark / grab this text the Xorg window driver would take much less and the percent for mksquash ran from about 88% to 92% with minor variations. For the bulk of the run I was doing nothing at the keyboard or screen / mouse combo so it was about 89% / 90% average I’d guess (based on what numbers I saw most often).

    Here is the actual run. I started with making sure the file size was the same, then did the actual timed run.

    [root@CentosBox Testing]# du -ms *
    6926	GHCN
    [root@CentosBox Testing]# time sqit GHCN 
    Parallel mksquashfs: Using 1 processor
    Creating 4.0 filesystem on GHCN.sqsh, block size 65536.
    [=========================================================|] 111267/111267 100%
    Exportable Squashfs 4.0 filesystem, data block size 65536
    	compressed data, compressed metadata, compressed fragments
    	duplicates are removed
    Filesystem size 3625581.88 Kbytes (3540.61 Mbytes)
    	51.13% of uncompressed filesystem size (7090962.03 Kbytes)
    Inode table size 212903 bytes (207.91 Kbytes)
    	45.82% of uncompressed inode table size (464698 bytes)
    Directory table size 6406 bytes (6.26 Kbytes)
    	44.00% of uncompressed directory table size (14558 bytes)
    Number of duplicate files found 108
    Number of inodes 666
    Number of files 627
    Number of fragments 67
    Number of symbolic links  0
    Number of device nodes 0
    Number of fifo nodes 0
    Number of socket nodes 0
    Number of directories 39
    Number of ids (unique uids + gids) 3
    Number of uids 2
    	chiefio (500)
    	ems (1000)
    Number of gids 3
    	chiefio (500)
    	ems (1000)
    	root (0)
    real	53m44.308s
    user	43m13.933s
    sys	1m0.397s
    [root@CentosBox Testing]# 

    So from the NFS run above:

    real	60m39.016s
    user	44m28.409s
    sys	3m4.100s

    We find about 1 minute more “user” CPU time ( 1 minute 14.476 seconds ) and about 2.4 minutes more system CPU time ( 2 min 3.703 seconds) for the NFS run. The NFS run was longer “wall time” by about 7 minutes.

    But, in the end, the Raspberry Pi Model 2 still beat the local compression on the “big box”:

    real	45m49.139s
    user	173m37.960s
    sys	4m7.400s

    by about 8 minutes elapsed time ( 7 min 55.169 seconds).

    Dividing CPU time by 4 for the four cores gives: 43.408 CPU minutes / CPU for “user” and 1.03 minutes for “system”. Beating the AMD Semperon by a little. About 3% faster. But when User and System are added together, it ends up at Total CPU AMD: 44.238833333 and total / core Total CPU Pi: 44.438 or essentially a ‘wash’.

    (That also leads me to believe that much of the roughly 15% difference in elapse time comes from the R.Pi having 3 cores dedicated and only one of them that takes any interrupts for handling ‘whatever else is going on’ and does that more efficiently)

    Realize this is NOT a pure benchmark and that slightly (significantly?) different code is running on each machine. The OS is very similar, but one (Centos) is more Red Hat industrial and a bit older while the other (Raspbian) is a Debian derivative and a very young port. Libraries will be quite different and optimizing settings during the compilation of OS, Libraries and Applications (like mksquashfs) will be different. In short: YMMV and any other real benchmark will yield different results.

    For me, the “bottom line” is one I’ve often seen. Any time you get into the business of shoveling GB of data around, you lose on time to completion. Coupled with “new cheap hardware is faster than old expensive hardware” at about the rate of Moore’s Law.

    Oh, and of course a more modern AMD 64 bit CPU with, say, 4 cores and 4 GHz instead of 1.8 would go a LOT faster… Like one of these:

    But I don’t have one so we are ending up back at the ‘reality constraints’ that lead to my posting in the first place… Given what I’ve got, it’s faster to just use the RPiM2 and USB 2.0 port with the USB disk and let it run a couple of days…

    But now you know for sure, complete with stats and all ;-)

  6. p.g.sharrow says:

    @EMSmith; I am pleased to see that the “Beer Can” computer vision is being proved by you and delighted to hear you are having fun with this “Toy”. The massive data manipulation is a good test of the hardware, software and your versatile abilities to steer all of the parts together. I doubt there is another that could accomplish that feat. I look forward to your solution for long term memory storage.
    The ability to survive disaster and replace inexpensive parts quickly, as needed, in a secure manner is an important part for future computer aided needs…pg

  7. E.M.Smith says:


    I do tend to make my hardware work for its keep ;-)

    And there’s nothing like shoving a few hundred GB of encrypted data around to find out what it does… and remind you what ‘patience’ is all about ;-)

    FWIW, I have another Quad Core ARM device that is Just Dandy for things like videos and has a slick fast user interface. I think it sets the processor bar at the right place for “way more than toy” performance. As my eventual final desktop solution, I expect to get something with about that chip speed. ( IIRC the CubieTruck board is about this speed and with SATA connection and USB 3.0 is all arround the likely winner… but not until I’m doing working these boards over and seeing if next year brings another price / performance step…)

    That device is the Samsung Note.


    1.4GHz Exynos Quad-Core Processor


    On 29 September 2011, Samsung introduced Exynos 4212 as a successor to the 4210; it features a higher clock frequency and “50 percent higher 3D graphics performance over the previous processor generation”. Built with a 32 nm High-K Metal Gate (HKMG) low-power process; it promises a “30 percent lower power-level over the previous process generation.”

    It’s that 50% higher clock that makes it just fine, where the single core performance of the PiM2 is marginal on some things. Of course, better parallel processor performance for things like browsers could also fix this, but by the time that gets done hardware will have notched it up again.

    So that, IMHO, is the bounds on hardware. R.PiM2 for “almost everything” and certainly almost anything headless. The Exynos 4212 level performance for a smooth comfortable ride doing things like videos and making a Digital Video Recorder. ( I did find a project for the PiM2 that makes it a DVR, so it can be done, but I’d rather have a bit more ooomph as I tend to push things over their design points… though I’m sure you never noticed ;-)

    It’s the Cubieboard 4 in this link:


    Cubieboard 4

    On May 4, 2014 CubieTech announced the Cubieboard 4, the board is also known as CC-A80. It is based on an Allwinner A80 SoC (quad Cortex-A15, quad Cortex-A7 big.LITTLE), thereby replacing the Mali GPU with a PowerVR GPU. The board was officially released on 10 March 2015.

    SoC: Allwinner A80
    CPU: 4x Cortex-A15 and 4x Cortex-A7 implementing ARM big.LITTLE
    GPU: PowerVR G6230 (Rogue)
    video acceleration: A new generation of display engine that supports H.265, 4K resolution codec and 3-screen simultaneous output
    display controller: unknown, supports:
    microUSB 3.0 OTG

    Unfortunately, last I looked, it was still running about $150 on Amazon, so not buying one anytime soon as I’m still not ‘gainfully employed’ at the moment.

    BTW, realize that the “Octo-core” is really “alternating quad-cores”. They don’t all 8 run at once…

    The A15 chip is a fancier faster chip than the A7 (in the R.PiM2) so this is really like a RPiM2 that runs a little faster clock on the V7 set but can ‘throttle up’ to the A15 for more “oomph” when needed (and take more power then) on any given task in memory. The “big.LITTLE” is an interesting idea, and clearly aimed at power careful devices like tablets. I’d be happier with just a solid faster quad core, but I’m not a very big market ;-)

    Eventually, though, there will be a home / craft board with something like the fast Exynos chips on it. For now, it’s the R.PiM2 for “most things” and that CubieBoard as the someday goal speed.

    Anything in between those ends, inclusive, ought to be workable to fine.

    As per long term storage. Simples.

    For archival things, write CDs or DVDs and put them in a water proof fire proof box. Ammo cans are nice. For not quite archival rapid recovery massive data, put a copy on a 3.0 USB disk and wrap carefully in non-static cushion wrap. See ammo can above… For “Aw Shit” remote recovery, but a big encrypted blob inside an encrypted .sqsh file system and put the whole thing into a ‘secure cloud’ location preferable in some second country. Do not name it with .sqsh, but with some name like “Core Dump from VAX”… or “bitmap data of snow cover”… Anyone who gets past the social engineering misdirection and the double encryption and knows how to figure out it’s a sqsh and… well, long before then they will have shown up on your door with a subpoena and guns and demanded you open it…

    The CDs and DVDs ought to survive anything short of a nuke (don’t forget to pack a drive in a big metal can too…) where the USB drive is a little more ‘iffy’ to rough handling and temperature flux and you likely will need to run it every so often so the spindle doesn’t freeze up. Oh, and don’t forget that the electrolytic capacitors need normal volts to stay intact, so it at a minimum needs power every so often just to kept them happy. Not a ‘deep archive’ sort of thing. SD cards and USB sticks are worse though. They lose volts slowly over time so rewrite cells to keep the memory right. Unpowered for a year or two, they start to forget… so plug them in to power at least once a year…

    The only bit I don’t have working just yet is that cloud thing… I need to find a cloud provider that I’m OK with. Google gave me a free ?GB? 100 GB? something for a couple of years with the Chromebox… but they root around in your files and look for interesting things like email addresses… so I’ve only put generic stuff in it. It turns into ‘fee for service’ soon anyway so I’ve emptied it now.

    BTW, IIRC, the Samsung Chromebook has a Exynos processor in it and can be booted to Linux. It would be just as good as a DIY box since it supports “legacy boot” of Seabios (IIRC) and comes with all the goodies already (kb, mouse, screen, etc.) But I’m going for minimal at the moment and, frankly, don’t trust Google to not have some subtle buggerage in the box as they are in the Prism program. I could see the ‘legacy boot’ being a clean box all the way, or tattling to the Mother Ship that YOU had converted away from Chrome and needed watching… and maybe even opening a nice little door to anyone with the magic sauce to watch your network traffic. I’d want it on a sniffer to watch traffic before I’d trust it to be squeaky clean.

    But as a daily driver light use browser machine, I could see that… (Kind of like I now treat the Chromebox and how I’ll treat it after I ‘legacy boot’ it to Linux ‘someday’…)

    Hope that lays it out for you well enough. If not, there will be more detail tech postings over time…

  8. Speaking of large file systems, this article on ZFS (and some nascent competitors) might be interesting to you.

    ===|==============/ Keith DeHavelle

  9. Larry Ledwick says:

    Unfortunately conventional CD and DVD are not archival, they have a nasty habit of degrading or even worse the metalized layer separates from the substrate. Even worse there is no way to test for it, different batches of the same brand of disk will behave differently according to tests by places like library of congress and NIST.
    The only “archival” digital disk right now is the M-disk with potential life time in storage measured in hundreds of years. I have a DVD burner in my photo system rated to burn them (LG M-disk) but have not purchased any yet to archive my photos. (work in progress item)


  10. E.M.Smith says:


    They arn’t archival, especially if left in the sun… but they beat the pants of most of the alternatives. SD cards start to lose their minds after 2 years of no power (so even if you just plug them into a powered on USB hub with an adapter that’s enough to refill their little capacitors…) and removable magnetic media tends to “go south” after a decade, sometimes less (but the hardware to read them seems to cycle faster anyway… try finding a 9 track tape drive, an 8 mm tape drive, a Zip drive, a…) and, of course, hard disks themselves have the bearings freeze up if you don’t run them… and wear out if you do…

    IMHO the best way to assure data survival is at least 3 copies in at least 2 technologies. Frequently “moved forward”. So I have most of my stuff on CD, especially old crap that if it DID go away would not be the end of the world. Like that backup of the Toshiba laptop from 1990 or so… where I’ve carried forward the data I really cared about in the “live” copies. Then I’ve also got a ‘recent backup’ of it onto “some backup media du jour”. Sometimes a 64 GB SD card. Sometimes a USB disk. Sometimes a hard disk in the corner. Sometimes a RW DVD… And then it exists as a duplicate of that ‘working copy hard disk’ on the ’emergency USB disk’ that isn’t plugged into anything.

    FWIW, I’ve regularly tested some of my old CD copies of things. Some are about 20 years old now. Still working fine. IFF you keep them in a cool and dry place they hold up quite well. No, not “museum archival” of 100+ years. But quite well enough for my needs.

    No idea yet on DVDs durability. I wrote some a decade back that still play, but they are video so not so sure I’d see a few bits dropped… Don’t have any old binaries on DVD yet…

    If you REALLY want it to survive, print it out and take a picture of it on Kodachrome…

    Oh, wait, Kodak stopped making Kodachrome… THE best film / photo media / archival media of all time ever… damn it….

    Short of chisling into stone (that has a low information density) silver acetate film and paper prints are about as good as it gets. Kodachrome was matching it (but it will be a couple of hundred more years to know for sure) and added color. Everything else is 2nd rate or worse. ( I suppose you could do some kind of etched glass or other exotic that would be better – I’ve thought of stencils under glaze on tiles as a way to pass knowledge to future generations … kind of a modern day ‘clay tablet hoard’… but we’re talking digital stuff here… and you could put digital onto film… when folks made film…)

    Oh Well….

    So my strategy is to “copy forward” about once a decade for ‘essential’ stuff while keeping the older copy (against things like a crappy batch of a new idea in chemistry… and as a ‘free’ lifetime testing method). Then having multiple copies (as above) with more and more recent copies for the things that are most important. (I.e. I have not copied forward the backups of the last 3 laptops that died as my MS 95 laptop is not of much interest any more… and I moved the files I cared about onto newer laptops anyway…)

    But, for a “Box For Armageddon” where the expectation is that anything from flood to EMP is likely to wipe out the stuff connected to wires above ground, the expectation isn’t that you will bury it and forget it for 100 years ( i.e. not needing ‘archival’) but that you want something that you can ignore for a 1/2 decade or at most a decade, then unbox it and it ought to work. For that, a CD drive and CDs in a metal ammo can is about the best you can do; with a second copy on a USB drive in a separate ammo can at a separate site. You have redundancy of tech, of location, of electrical fittings (i.e. will a computer in 8 years have a USB port?), of media, of exposures (emps doesn’t do the same thing to CDs as to USB wires…), etc. But I’d advise to test one of them every other year when you update the contents…

  11. Larry Ledwick says:

    Yes you got it. There has been a real debate going on among library professionals and folks like NIST to come up with a truly archival digital storage media. The library of Congress, has found exactly what you mentioned that the technology to read the media is more perishable than the media in many cases. They have staff who’s only function is to scour surplus sales for old hardware and build a basement full of salvage gear so that they can read old media.

    Every year when I update my anti-virus software I order a physical media disk on DVD of the distribution as a backup. A few months after I renewed this year I opened the hard copy DVD package and found that the entire Mylar reflective layer on the DVD had delaminated. and was just a free floating wad of aluminized Mylar shreds. I was able to down load on line and recover, but this was a DVD that was sealed in factory packaging, mailed to me and sat on a desk top untouched for 3-4 months in an apartment with air conditioning, so it never saw extreme temperatures or humidity changes. I had heard of this sort of thing happening but this is the first time I have seen it in person. I also have a stack of CD’s with backup images on them, and so far none of them have delaminated, but I know it is possible at any time. I currently store my data on spinning disks. I buy a new 2 TB disk and write an image of my current library of images, then retire the active 2 TB drive and the new image disk becomes the active. When I get to 3 generations of such backups I will rotate the old drive back to become the new image drive, so they will be re-imaged about every 3-4 years, with a fresh write cycle to out flank bit rot on the platters.

    Library professionals are very concerned that there will be a black hole in our history from this era due to degraded digital media making the data disappear, just like there is a huge era in the late 1800’s which has nearly illegible written records due to use of inks which faded out completely over time and high acid papers which crumbled to dust.

    According to my research a flash memory storage is actually one of the most reliable forms of long term storage theoretically good for about 100 years without a re-write cycle. It would be nice if someone wrote a utility which would read a disk, to memory then re-write the sectors byte for byte like dd command does to “renew the image” and then run a verify that the re-write was readable and remap any blocks which were suspect. That way you would not have to constantly move the data to a different physical media. But then you still have the gradual obsolescence of the physical media itself. Much like hard disks and IDE, ATA, SATA, SCSI etc. the old disks still work fine if you can find a compatible card and cable to connect to them. Same goes for data interfaces like USB and HDMI, the old firewire etc. I think right now the most robust data interface is probably the USB family for the near future (10-15 years).

  12. E.M.Smith says:


    Yes, interesting article. Though I found myself doing introspection…. (Like “Just why am I never in that ‘fan boy’ mode / group?” and “ZFS? Useful but not spectacular… why do I think that and they don’t?”…)

    Then this paragraph (bold mine):

    ZFS, of course, is not to blame. Nor, as far as I can tell, are its corporate supporters or its open source developers. Where ZFS seems to have gone awry is in a loose, unofficial community that has only recently come to know ZFS, often believing it to be new or “next generation” because they have only recently discovered it. From what I have seen this is almost never via Solaris or FreeBSD channels but almost exclusively smaller businesses looking to use a packaged “NAS OS” like FreeNAS or NAS4Free who are not familiar with UNIX OSes. The use of packaged NAS OSes, primarily by IT shops that possess neither deep UNIX nor storage skills and, consequently, little exposure to the broader world of filesystems outside of Windows and often little to no exposure to logical volume management and RAID, especially software RAID at all, appears to lead to a “myth” culture around ZFS with it taking on an almost unquestionable, infallible status.

    Ah, yes… been doing Sun stuff since Sun O/S and prior to Solaris… and BSD since when there was only one kind and “4.2” was a distant vision… No wonder I balked at the notion that folks see the file system as integral with the O/S … since I’ve been playing with different file systems since about the early ’80s ? Something like that…

    FWIW, I’ve only just now been willing to “commit” to EXT4 on most of my Linux uses. Until now I’ve mostly been EXT2 so as to avoid the journal writes. (I rairly had crashes so didn’t see the fsck at boot to be an issue…). Even now, I don’t like EXT4 on SD cards due to the heavy write load. I’ve tried XFS and it was nice. (We supported it on Indigo boxes IIRC back about 1992?

    I was going to post about the UNICOS file system being “different” (it supported all sorts of things like clusters and stripes and such… along with migrating data blocks to a tape robot while the inodes stayed on disk) but ran into this crap:


    UNICOS was originally introduced in 1985 with the Cray-2 system and later ported to other Cray models. The original UNICOS was based on UNIX System V Release 2, and had numerous BSD features (e.g., networking and file system enhancements) added to it.
    CX-OS was the original name given to what is now UNICOS. This was a prototype system which ran on a Cray X-MP in 1984 before the Cray-2 port. It was used to demonstrate the feasibility of using Unix on a supercomputer system, prior to the availability of Cray-2 hardware.

    Flat Out Wrong.

    When at Apple, running the supercomputer site, we had planned to come up on COS and swap to UNICOS on public availability in about 6 months, but then decided to NOT take the conversion costs when the schedules approximated a bit more. Leaning on Cray and agreeing to take it a bit “rough”, we were THE very first public instal of UNICOS and it was on an XMP-48.

    Cray put on a party for us, and we all got T-Shirts and Mugs with first UNICOS ship date on them as our install date.

    It was most assuredly not a “prototype” system. We ran it in production with little change other than the usual maintenance updates until about 7 years later. All materials were labeled UNICOS and our rep gave us a P.O. to sign that said “Unicos”… as did all the manuals and supporting materials.

    It was not used to “demonstrate the feasibility”, it was used in production among other things running Moldflow software used to make the plastic molds for all the Mac-II line (along with other things).

    Either someone is making crap up, or has an axe to grind, but in any case that Wiki is a fabrication.

    Back On Point:

    I’ve been using a LOT of file system types for a long time, so while ZFS is of modest interest to me, I’m not a ‘fan boy’ type and mostly find it funny that some folks are. It’s just a file system. Yes, a very fancy one with built in RAID and such (though they did muck up the names of the levels – as Sun liked to do in their “make our stuff different and strange” way… so we can say it is better than Unix… when it isn’t…). Basically, from my POV, you just do a test on each one, examine the feature sets ( i.e. “Do I need RAID built into the file system?” and “Do I need Journaling?) and pick the best match…

    Must be a Unix / Linux thing ;-)

Comments are closed.