My, What Big Datasets You Have, Grandma NASA

So I’d done a complete NASA dataset scrape in 2015, and then “touched it up” recently with a full update (rsync nicely only downloads what has changed). Then made sure CDIAC was up to date (as they had posted a ‘Going Out Of Web Data Business’ sign on their site hosted at ORNL…)

OK, so that’s pretty much it, and it was something like:

cat 1DU_2017Jan22
309536 MB
141980 MB

Big, but not really a problem-big. Under 1/2 TB for the whole thing. Yeah, takes a week or two to download the first time you do it, on a slow home link, but then the updates are not that big.

So after I’ve got CDIAC resynced, I decide to run One Last Pass to make it all synced up to “now” as a final touch up run before I archive the 2017 starting status set…

Well… A week or so later I’m still running. One wonders “Why?…”

“Why? Don’t ask why. Down that path lies insanity and ruin. -E.M.Smith”…

So, exploring “Why?”:

It looks like it is in the “daily” readings, something called “superGHCNd”. For those wanting to look at it, here’s the FTP link:

Seems they may have restarted processing and presenting a GIANT data set. But this says it restarted in October of 2015 (matching more or less the start of the set on the ftp server) yet it only now seems to have shown up in ‘pub/data’, so maybe someone decided to make it available as part of their worry over a ‘going out of business sale’ too? Who knows… but “there it is”.

chiefio@orangepione:/SG15/ext/$ cat ghcnd_diff_status.txt

ghcnd_diff status

October 21, 2015

Updates to the superghcnd_diff, superghcnd_full and by year files have

October 15, 2015

A server failure on October 12, 2015 has prevented the creation of the
superghcnd_diff and superghcnd_full files (as well as the yearly files provided
in the by_year directory). The creation of these files will resume as soon as
possible. The daily updates of GHCN-Daily are not affected by the server

August 08, 2011

ghcnd_diff pushed into production. yearly files are stable

Now this says it was resumed in 2015, yet I don’t remember downloading this whale back then, nor in the latest update. Either I somehow missed it, or this particular dataset was just shoved onto the ftp server recently. How big is it?

chiefio@orangepione:/SG15/ext/$ du -BMB -s .

454772 MB	

So almost 1/2 TB just on it’s lonesome on my server, and it isn’t done downloading yet!

It looks like the “diff” files are up to January 2017

-rw-rw-r-- 1 chiefio chiefio      366384 Jan 29 18:24 superghcnd_diff_20170128_to_20170129.tar.gz
-rw-rw-r-- 1 chiefio chiefio      397807 Jan 30 18:25 superghcnd_diff_20170129_to_20170130.tar.gz
-rw-rw-r-- 1 chiefio chiefio    11673550 Jan 31 18:17 superghcnd_diff_20170130_to_20170131.tar.gz
-rw-rw-r-- 1 chiefio chiefio     2151686 Feb  1 18:16 superghcnd_diff_20170131_to_20170201.tar.gz

But the “full”, that starts in Sept 2015, is only up to November:

-rw-rw-r-- 1 chiefio chiefio 10899438827 Nov  6  2015 superghcnd_full_20151106.csv.gz
-rw-rw-r-- 1 chiefio chiefio 10919133594 Nov  8  2015 superghcnd_full_20151108.csv.gz
-rw-rw-r-- 1 chiefio chiefio 10921450656 Nov 10  2015 superghcnd_full_20151110.csv.gz
-rw-rw-r-- 1 chiefio chiefio 10923604993 Nov 11  2015 superghcnd_full_20151111.csv.gz

It looks like about 10 GB / Day-of-data…

Given 365 days, that’s about 3.65 TB / year, and I’ve got 2 more full years to go… plus about 50 days (or about 1/2 TB) of 2015… I think I’m gonna need a bigger disk…

So figure (3.65 x 2) +( 3.65 x 4/12 ) = 8.5 TB of data… Make that a VERY big disk…

AND that is for the .gz Compressed files! Lord help me if I want to decompress them to look at them.

(Now you see why I was looking at making a large RAID array out of little disks… one BIG file system image spread over a several much smaller disks).

Well, at this point, there’s no way I can store that. The good news is that it will be a week or three before it can fill up what I do have as storage given the soda-straw it is sucking on to get the data (i.e. my home network and speed limits set on the download by me) so I have lots of time to decide just what to do about it. Most likely I’ll just halt this download and specifically fetch the last copy of “full” from 2017. “superghcnd_full_20170204.csv.gz”

Most of this is going to be redundant (i.e. history the same in each iteration) but useful for comparing “rewriting of history” sometimes. So “day to day” rewrites not as interesting as the “first to last” comparison. Plus, I have the diffs so ought to be able to locally recreate the “full” based on applying the “diffs”.

So “decisions, decisions”…

But what is IN a “superGHCN” data set anyway?

chiefio@orangepione:/SG15/ext/$ cat readme-superghcnd_full.txt
The following information serves as a definition of each field in one line of data covering one station-day.
Each field described below is separated by a comma ( , ) and follows the order below:

ID = 11 character station identification code
YEAR/MONTH/DAY = 8 character date in YYYYMMDD format (e.g. 19860529 = May 29, 1986)
ELEMENT = 4 character indicator of element type
DATA VALUE = 5 character data value for ELEMENT
M-FLAG = 1 character Measurement Flag
Q-FLAG = 1 character Quality Flag
S-FLAG = 1 character Source Flag
OBS-TIME = 4-character time of observation in hour-minute format (i.e. 0700 =7:00 am)

See section III of the GHCN-Daily readme.txt file (
for an explanation of ELEMENT codes and their units as well as the M-FLAG, Q-FLAGS and S-FLAGS.

The OBS-TIME field is populated with the observation times contained in NOAA/NCEI’s HOMR station history database.

chiefio@orangepione:/SG15/ext/$ cat readme-superghcnd_diff.txt
The superghcnd_diff_yyyymmdd1_to_yyyymmdd2.tar.gz files contain the changes to the superghcnd_full data between the two dates listed
in the file name (yyyymmdd1 and yyyymmdd2, where yyyy = year; mm=month; and dd=day of the two different days that are compared).

There are three files contained in each superghcnd_diff_yyyymmdd_to_yyyymmdd.tar.gz file:

update.csv: contains changes to values or flags that were present on yyyymmdd1
(i.e., these values or flags have been altered between yyyymmdd1 and yyyymmdd2)
insert.csv: contains values that were new on yyyymmdd2 (i.e., that were not yet available
on yyyymmdd1, but were newly available on yyyymmdd2)
delete.csv contains values that were present on yyyymmdd1, but not on yyyymmdd2
(i.e., have been removed from the yyyymmdd1 version of the dataset as of yyyymmdd2)

The format of update.csv, insert.csv and delete.csv is the same comma delimited format as in the superghcnd_full and /by_year files
(as indicated below):

ID = 11 character station identification code
YEAR/MONTH/DAY = 8 character date in YYYYMMDD format (e.g. 19860529 = May 29, 1986)
ELEMENT = 4 character indicator of element type
DATA VALUE = 5 character data value for ELEMENT
M-FLAG = 1 character Measurement Flag
Q-FLAG = 1 character Quality Flag
S-FLAG = 1 character Source Flag
OBS-TIME = 4-character time of observation in hour-minute format (i.e. 0700 =7:00 am)

See section III of the GHCN-Daily readme.txt file (
for an explanation of ELEMENT codes and their units as well as the M-FLAG, Q-FLAGS and S-FLAGS.

The OBS-TIME field is populated with the observation times contained in NOAA/NCEI’s HOMR station history database.


Hmmm… Not very helpful. Mostly just describing GHCN generic data format. The size could still come from more stations, or from being hourly instead of just daily observations, or… So more investigation needed. And a place to unpack a 10 GB file…

Oh, and notice from the machine name in my command prompt that all this download work is being done by the Orange Pi One card. That little guy has been a real trouper just cranking away on this download day after day. While you can’t load up more than 2 CPUs with heat sinks, or one without, this isn’t CPU intensive, but I/O limited, so a great job for the little guy.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues, Tech Bits and tagged , , , , , . Bookmark the permalink.

32 Responses to My, What Big Datasets You Have, Grandma NASA

  1. philjourdan says:

    If you can’t dazzle them with facts, baffle them with BS. Seems they are doing the later in preparation for auditing of their data.

  2. p.g.sharrow says:

    @EMSmith; It appears that the tools for massive data management & storage will need more “smithing”. ;-) I think that growing this thing with simple/cheap components will in the end be the better choice. Besides it is kind of fun to create a computer farm out of kids toys. Kind of like building things out of several Erector Sets when I was a kid. I spent most of the last 40 hours assembling a 12″ stack of parts and chips to better follow your lead and see if someone of my level of ignorance can make things work…pg

  3. A C Osborn says:

    EM, is this to do with the statements about Karl’s last warming paper?
    After release they realised that we were on to them regards their splicing & dicing of sea and land data and poor Stats to increase the warming trend.
    With the whistleblower comments by John Bates which is in the press
    all over the Forums and especially so at Climate ETC

    There is mention of correcting the “data” and “Software” which will correct the Karl paper.

  4. E.M.Smith says:

    I’ll likely summarize my RAID experience with mdadm in the next week or three, for easier following :-)

    Maybe after I decide what to do about a 7 TB shortfall in my storage and a data flow that grows by a 20 hour download time per 24 hour day… (best would likely be to design a small program to apply the diff to the last full and then only download the diffs… but I’ve got a week or two to decide.)

    A 4 TB usb disk costs about $120 (just bought one…) and I would need three of them for a big enough array (RAID loses some to parity… unless raid0 or striping). So the question I need to answer this week is: Is it worth $240 more to packrat all the daily full copies instead of writing code to apply the diffs? Most likely one FULL per month pluss diffs would still be overkill.

    Well, it’s a hobby ;-)

    And: A one foot stack? You got a dozen Pis or what?… or spaced 4 inches apart…


    When designing solution systems, there are compute intensive problems, memory intensive problems, and I/O intensive problems (well, and time critical problems, but we’re not talking RTOS Real Time O/S here, yet…) and that us why I stress test hardware and characterize it under Amdahl’s Other Law. Every hardware has a highest and best use case, the trick is finding the match at the lowest cost in $ and time…

    The good thing is that once characterized, that info helps all. So the Orange Pi One is a great little $16 download and NFS server, but thermal limits at one full core or 2 with heatsinks. That infomation generalizes to ALL small board H3 chip based SBCs and has implications for other quad core systems too (I.e. get heatsinks…) That then says that the Really Cute Nano Pi stack of 8 (two per dogbone layer) even with their big heatsinks most likely can’t run more than a 50% to 75% max duty cycle. Thus saving me about $200 of buying and building to disappointment…

    FWIW, I’ll likely buy 1 just to characterize it and then send it to The Farm or build it into a portable WiFi Hotspot, firewall, fileserver lunchbox… a more likely appropriate use case.

    As of now, the Pi M3 lets me run 4 cores full speed no limt, so is the favored compute module. The others are better for I/O work at modest CPU. None of them have excess memory (though the Odroid looks like it might… I need to test it more) and the Cubie family has some very interesting disk IO speeds for a final USB 3.0 disk farm (though until now I didn’t need that speed… but dumping 8 TB at USB 2 is a royal pain…)

    Oh, and honorable mention to GPU code maturity and GPU chipset for the desktop graphics bit. Raspberry Pi has good GPU use (now… early os code didn’t use it…) while the “embedded” target use folks seem to blow off the GPU porting work (thus doggy X windows even with lots of big cores and a good GPU…) so if running headful, the “PC” flavor SOC boards are likely preferred, just from that. At least until someone ports to the SOC GPU on the dinky boards. Use of GPUs as vector compute engines is a growing area, so “watch this space” for compute engine crossover.

    Isn’t benchmarking and profiling fun? (Not! – but necessary to avoid buying 32 cores of 64 bit compute engines only to find out you can only use 8 at any one time… and can’t kkep them fed due to IO limits and memory swapping from too little of it…)

    Welcome to my world…

  5. E.M.Smith says:


    I doubt it is related, but maybe. To me it just looks more like the “insiders” are prepping for the (maybe) ax so putting their desired datasets on the public server so they and their friends can snag a copy “for that day”.

    I’m happy to help that via making a HowTo and snagging my own archive set (willing to share it with skeptics and warmers alike if needed) simply because I believe “The data just are.”.

  6. p.g.sharrow says:

    @EMSmith:”Well, it’s a hobby ;-)

    And: A one foot stack? You got a dozen Pis or what?… or spaced 4 inches apart…”

    A Ethernet/USB hub, 5 Pi-2s, a 1Terabit USB disk, Pi-3, Pi-1, 12amp-6port power supply all in DogBone. I added 1/8″spacer nuts to the board stand offs to get a better spacing still within the DogBone 1″ spacing. Used 2 of the stand offs and 4 of the rubber foot pads to hold the USB disk in place. All and all a nice compact package Still a lot of work to do to get it all to work together, But handy to move wires and chips around when I learn how to intergrate. Oh yes and 4 flat screens and 4 sets of input devices out of Goodwill – cheap
    But crowded desktop. Interestingly the flatscreens USB ports put out just enough power to operate a Pi, handy for the desktop operations to be able to power up and down from the display switch.
    3- 4 terabite of disk! on a desk top 8-) quite a toy set. We will see how well your angels can bless your efforts on everyone’s behalf…pg

  7. E.M.Smith says:

    Ah, I see, the full stack not just the compute nodes..

    BTW, I just use ssh and one screen for most things, only use a directly connected screen on a headless node when it refuses to boot. VNC if you want a GUI. But you may have more desk space than I have.

    FWIW, my total disk space ATM is 3 x 1 TB older disks, 2 x 2 TB medium old disks, one very old 500 MB not used at the moment and a 1.5 TB full of temp data site scrape.. Then a 3 TB and 4 TB both bought in the last month (and thank you very much! :-) Thats about 9 TB in the old set and 7 in the new set. Total of 16 TB.

    At present, I’ve drained most of the old set into the new 7 TB so I could try building a 4 TB RAID and look for “issues”. Of that original roughly 6 TB of non temperature stuff, much of it is redundant. Things like backing up the old 111 MB WD Disk a couple of times into a 500 MB Disk, then it getting backed up later into the 2 TB Disk a couple of times… (and a half dozen old white box pcs backed up and…) My next big time sink is just to go though that, unpacking the gzips of gzips of duplicate backups and toss any “more than 2” copies. I can likely get down to about 2 TB of real valuable data (even there, I likely really don’t need things like my cannonical collection of all 6.x Debian releases, especially now that that kernel is shown vulnerable and I’m not going to be needing it for the old PC hardware drivers when that hardware has died…)

    Oh, and I’ve accumulated about a TB of dd chip copies from the Pi OS explorations. Since I’m now settled on Devuan, I likely don’t need the intermediate install steps on Void or Slackware chips…

    In short, while tempting to just “throw money at it” and buy more disk, as that is cheaper than time, I think pruning will ultimately save more time. In reality I’ll likely just pick one data type at a time (like chip dupes) and slowly bring order to a decade+ of just throwing more disk at the archive of crap… This being entirely separate from the problem of an 8 TB temperature archive… for that I’ll likely start with moving the 3 TB disk contents back on the (prior) raid parts (pruning as I go…) and put that 3 TB disk on the Opi as temperature data. That will give me about 2 months of dowload time to think about things and design a better solution.

    Most likely I’ll order another of those 4 TB disks and make an LVM set (Logical Volume Manager) since LVM, while more complicated than straight disks, simplifies file system maintenance and data moves (basically real disk partitions can get bits assigned to whatever logical volume needs more space.)

    None of it with RAID redundancy, but gets me out of the copy and move business…

    Well, the only thing more dull than doing data trash collection is talking about it, so I think I’ll go try firing up the Odroid and doing something more interesting ;-)

  8. E.M.Smith says:

    Well, on full 100% x 4 cores (or really 400% plus browser and more) for a few minutes, the Odroid not only is letting me type fast and easy (using Ubuntu MATE no less, not exactly a slim efficient OS) but it isn’t getting all that hot, either:

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 67

    then after a few minutes

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 71

    And a bit later after loading and typing all this:

    Hostname: odroid64
    CPU Frequency: 1536Mhz
    TEMP: 73

    So “takes a punch” very very well. I’ll cross post this back where I did the other heat /speed tests too. But for a higher end compute node, the Odroid-C2 has it. (Though I still need to get Devuan onto it… at present it is Ubuntu with SystemD).

  9. CoRev says:

    Been waiting for the early reports of the O-C2, so good to hear. I’ve been considering going to the O-c2 or O-xu4, just for the power, & replace my current Win10 desk top.

  10. E.M.Smith says:

    The only “issue” on the O-C2 is the software. The arm64 build is yet young and not all things are ported to it and working well yet. There are 32 bit armhf builds for it, but harder to find. Even a “nultiarch” build that claims to run both armhf and arm64 applications (that I’m going to attempt installing Real Soon Now ;-)

    The supplied Ubuntu works reasonably well, if you like Ubuntu Mate, which I don’t… so I’m going to be doing a Debian / Devuan build process for it. (which I wanted to do anyway, make my own OS from original parts…) Then using LXDE as is my preference.

    The physical build quality is very good. Two quirks: IF you are using a dogbone case, then you get to disassemble it to swap the eMMC card (i.e. take it out and load a new OS then replace). If you are using a stand alone R.Pi case, that doesn’t fit as it pokes down too far; and the SD card comes out the side where there is no hole… So you need to get a dedicated case… which will cause cooling issues if enclosed. The second point is the SD card holder is dinky and hard to see / put the chip in, unless you pick up the stack and have a good light… Not a problem once stable, but during OS development and backup / restore it’s a bit of a PITA.

    Overall, for most folks, the R.Pi M3 is likely a bit better. For those willing to be more engaged with the operating system install / customize process and wanting a “no waiting” browser experience, the Odroid-C2 will be better. As a back end headless compute server, it ought to be nearly ideal (especially if multi-arch so the headend can assign it either 32 bit armhf or 64 bit arm64 binaries to execute as desired…)

    So I can get a Pi M3 up and running as I like it in about an hour. The Odroid C2 has been about a week now and still isn’t done (i.e. not Devuan nor LXDE yet)… but has more end stage performance once I get there.

    Oh, and IMHO either one is an improvement over Windows…

  11. CoRev says:

    Chief, thanks. I already have a Pi M3 up and running, but wanted a little more speed. After your reference to the O-C2 I got interested as the Pis are not headed for an upgrade any time soon. I’ve been lurking waiting for your report and doing my own research. That’s how I found the O-xu4. So, I guess I’ll wait a little longer until you have finished your build, to see your findings after the dust had settled. Have a good day.

  12. E.M.Smith says:

    I’m a fan hater, so I’ll never have good words about anything with a fan, thus my being mum about the O-XU4 family and similar.

    Note that I put a comment back on the Odroid thread about an interesting already built Debian for the Odroid, named Odrobian, that comes with dockers already in it. Docker lets you package up an application AND the binaries it wants i.e. libraries, and that lets you do fun things like run armhf applications on arm64 systems without installing libraries and environment.

    Given that, it looks like you can just download, install, and go with their image and add LXDE with an apt-get and pretty much have a system “as I like it” and able to run just about anything (with the work to put it in a docker if and only if needed).

    I’m presently slugging bytes around on the scraper system and doing some other pretty simple upgrades, but it ties up my management station (i.e. screen…) so likely not going to try that OS until this evening.

    FWIW, if you have the extra $40 and like playing with systems, I’d not hold back on buying the Odroid. Since you have a Pi M3, you already have clue about Linux and won’t be left high and dry with no system if you decide the Odroid isn’t for you or don’t like Linux.

    Also FWIW, I’ve not seen a whole lot of difference between the SD boots and the eMMC boots. I’ve not particularly driven it hard with lots of chip I/O, but for just “boot and browser” both seem fast enough to me. At this point (not finished with testing) I’m happy running it on a Class 10 SD card. Long term and on heavy use systems, the eMMC likely will have a benefit, but at a 50% uplift in system cost, not something I’m seeing as needed up front for just a Daily Driver desktop.

  13. CoRev says:

    I noted your fan-hater comment earlier, but that’s not one of my issues. Been in the bidness since the 60s working on the really, really old Apollo tracking systems. I have not been totally conversant with U/Linux. Just a dabbler. I have used several of your scripts. The scraper was the last one used to download the UAH data.

    When I last worked I was on the Ops/support/command & control side of DoD, but not on the development side. I got too old and too much of a management-type to add value there. ;-)

    I agree on the eMMC/SD issue for a daily driver.

  14. E.M.Smith says:


    Well, I’m typing this in Firefox on the Odrobian Hybrid system. Had to do an apt-get update, upgrade, and install lxde, but here I am!

    Happy Camper.

    Looks like I’m one “Devuan update upgrade” away from the target system. About 2 hours so far, but I wasn’t trying hard ;-)

    I’ve also got about an inch of black space below and to the right of my “screen” area so my monitor settings need some tweaking.

    Other than that, I’m happy with it.

    The rest of the evening will be spent developing a version of my “build script” for this platform and “moving in” (setting up MY account, changing passwords on THEIR accounts, mounting disk, etc.).

    Assuming the Devuan thing goes as slick as all the other times I’ve done it, I’ll be on Odroid-C2 / Devuan as the daily driver for a goodly while. (The Pi-M3 was being outfitted as the “headend” of the cluster, and I’d been slowly “polluting” it with my Daily Driver stuff, so now I can give it a good cleaning too ;-)

    At this point the structure will be:

    Odroid: Daily Driver, workstation, connection to ROW, management station, etc.

    Headend: Cluster manager, dual networks available for making a real Beowulf structure if desired (wireless to inside net, wired to Beowulf private network) Pi M3 but running armhf (32 bit) OS and codes)

    Headless1 and Headless2: Cluster worker nodes (Pi M2 so armhf 32 bit) Wired network, presently on ‘inside’ network, but can go into a separate switch with the Headend wired connection for a real Beowulf structure if desired. (Isolates network traffic inside the cluster from other traffic, like downloading a few TB of climate data…)

    Orange Pi: NFS Server and Site Scraper.

    RPi Original B: DNS, Squid Proxy server, ntp time server, etc. etc. misc. infrastructure stuff.

    By tomorrow I ought to be out of “futz with it” stuff on the hardware and OS side and be ready to “move on” back to the “make a model go” part.

  15. E.M.Smith says:

    Well, in theory, right now I’m on Devuan on the Odroid.

    There’s a couple of odd bits to this one…

    At boot time it fails to clear /tmp (looks like it tries using options that busybox doesn’t know) but that doesn’t seem to be problematic.

    Logging in to an lxde session the error box saying “No session for pid {whatever}” that looks to be an old bug “fixed by the last systemd update” in other releases for lxpolkit. Doesn’t do anything, just annoys. But something to fix… someday.

    There’s a couple of other “odd bits”, but really not much and nothing seems to stop it from working. I’m logged in as “me” with all my usual files mounted from disk.

    Oh, and at the update to Devuan, it puts up a Scary-Scary Panel telling you you are about to do a Very Bad Thing and ought not. (remove your kernel to replace it with the same one and that would blow away modules and scripts and such) so I did the recommended “Don’t Do It!” option… No idea why or why not. In any case, it seems to have worked.

    So as of now I have a (slightly rickety since it is the only one) Ordoid-C2 running Devuan! Yaay! ( I *think*…)

    With that, I’m back to “real work” and this will be my Daily Driver as long as it seems to work OK.

    It is notably faster than the Pi M3 (don’t know if that is the added 300 mHz clock or the complete lack of swapping from 2 GB memory or what, but it is nice) and per what I’ve read you can overclock to 1.8 gHz without fan, or 2 gHz with (but I’m not going to do that…)

    Setting the screen resolution when on a DVI / HDMI adapter is still a pain, but it works. The R. Pi does it automagically. This thing is hunt and peck among the sizes until one works… all while trying to read distorted or micro sized or nearly black on black or… type in the editor. But it’s working fine now. ( I had to choose DVI and 1400 x 900 @ 60 Hz to get it nice; Your Monitor May Vary).

    Well, time for a “beverage of my choice” as this has been a long one. Time for some “sweet success” moments on the couch with beverage and remote ;-)

  16. CoRev says:


  17. Steven Fraser says:

    EM: over at the GWPF, a very interesting article about GHCN algorithm instability has been posted, showing how results for a,particular location varied widely.


  18. E.M.Smith says:


    Thanks! I’ll be putting together a brief review-of-the-process article “shortly”.

    @Steven Fraser:

    I’ll take a look at it, thanks for the pointer.

    FWIW my hardware goal is essentially met and at a plateau. (That is, I need to make this lot go before I can possibly need to add anything to it, and the Odroid-C3 desktop is fast enough I can’t need any more there either – though the Pi M3 was close enough and with more stable software assortment.) Only thing left, really, is figuring out how to handle the “SuperGHCNd” 8 TB, but that will takes weeks to reach a problem point.

    This means it’s time for: Back to Data and Models.

    Which means unpacking and organizing all those old copies of GHCN, the newest one, and doing some comparisons. In particular, comparing specific old stations to new versions of themselves looking for re-writing of history, and comparing the old ‘trend’ to the new ‘trend’ to see what splice artifact-like thing is causing them. To that end, that article will be important.

    Then setting up old and current versions of GIStemp (and maybe even packaging up a “run your own at home” blob if anyone wants one).

    And getting Model 2 to run just for the first experience with a model running on a Pi-Stack and what kind of CPU suckage it has. Supposedly runs on one Intel PC from a few years back, so ought to do OK on a single Pi-Board; BUT it might need a bit of parallelizing work if it is single core too much, or needs more than one board to run in less than glacial time.

    Which will then put me back at Model E. It clearly is made for running across a large group of processor cores, but unwinding the dependencies is going to be slow. Finding where to get the input data even worse. (Hopefully if not found quickly, it at least will be buried somewhere in the site scrape I’ve done and a ‘go fish’ will find it.) At that point I’ll likely try cutting it back to a narrow main core and some essential support stubs. Run each stub with a trivial call / test stub of my own, run the core with them, and then set about adding all the optional bits. (It was originally devoid of some ‘enhancements’ and looks designed to let you not run them if desired – so things like oceans and ocean biology and ??? so ought to be not-too-hard to cut back to a smaller core for initial runs / testing).

    Finally, after all of that, I’ll know if the Pi Stack is enough to run an interesting Model Run if left for a few days, or if it needs a lot more nodes, or faster nodes. Yeah, that long… some projects are not for those with short planning horizons.

    FWIW, I’m very happy I was able to bypass the “roll your own Debian” on the Odroid-C2. That likely saved me anywhere from a couple of days to a couple of weeks. The Odrobian download (Debian 8 with systemd) seemed nice just as it came. It isn’t too hard to avoid those things known to cause systemd to hang, so for most folks that’s likely a fine place to just stop and smell the silicon roses. Moving from there to Devuan was ‘by the book’ with only a couple of “Wha?” moments (that kernel don’t-do-it moment and then the error at screen launch that you ignore). But it is a “one of a kind” system. Most folks will find it lonely running on something nobody else is running where any issues are not likely to be in a forum posting somewhere and asking questions will just get a “You are doing WHAT?” answer.

    Well, those data sets are not going to unpack themselves, so time to make and smell the coffee and get things moving again.


    Well, just found out gparted isn’t working on the Odroid / Devuan kit. So looks like my disk maintenance (i.e. reformatting those ex-Raid partition disks to use as a place to unpack GHCN…) is going to either need me to use the “mkfs -t ext4” manual method or drop back to using the R.Pi when I want to do that stuff.

    Why mention this? Because it is an example of the kind of thing that shows up in ANY release candidate, but especially in any young port or any different / unique port. Given that Debian for the Odroid is a young and narrow port, then layering Devuan on top it looks fairly unique at this time, there will be “things that don’t work”, which noone else has found and noone else is going to fix (as they may very well NOT be broken anywhere else…) and you are on your own. It is this kind of thing that causes me to caution others, especially those not running 4 OS types on 8 SD cards in 6 boards… but just running one as a ‘home gamer’, to stick more with the big following boards like the PiM3 for ease of use and low annoyance. For me, I can just swap back to the Debian or even the Ubuntu chips in the Odroid and see if gparted works there, on those occasions when I need it, or even just move the monitor / keyboard / mouse down-stack onto the PiM3 and use it, instead. But if I only had one board and one SD card with one OS, I’d be a little bit stuck…

  19. CoRev says:

    Well, I just pulled the trigger on an O-xu4. $100 w/case and SD media. For now I will stick with the Ubuntu OS. For my use some things are just not as critical as yours. It should be a damned sight better security than the Win boxes (5 at last count & 3 Pis) I have.

  20. E.M.Smith says:

    I’m likely to be rotating through the Ubuntu, Odrobian / Debian, and Devuan OS versions for a few weeks, just to “compare and contrast” and find where things work and where they stop working.

    My impression so far is that the Odrobian is likely the most complete and stable for hacker type folks, the Ubuntu for the more “home desktop” type (who don’t mind the pervasive dull green of MATE… or like playing with themes). Devuan at this point only for those of us made livid at the sight of systemd in the “top” display ;-)

    Do note that the Odrobian comes in hybrid and non-hybrid forms. I chose the hybrid, that really looks like it is 32 bit armhf with the arm64 libraries added. Ubuntu from Odroid is arm64 so a bit faster on some tasks, but fewer bugs ironed out of it and many applications still not ported right. Depending on application, pick one. (Chromium, IIRC, working on 32 bit but not 64 at the moment).

    Odrobian seems to have done a better job sorting out the 32 vs 64 bit issues.

    In any case, I think you will enjoy the board, especially that high end a board! I’d be happy to just use Debian and the C2 if push came to shove. (The systemd ‘issues’ are really fairly rare…) Most of that choice being about armhf being more mature and not liking MATE Green all that much ;-) Hardly a high tech issue….

  21. Another Ian says:


    “Even more on the David Rose bombshell article: How NOAA Software Spins the AGW Game”

  22. E.M.Smith says:


    Looks like you win the ‘good taste in hardware purchase’ award… Devuan has a build for the XU but not the C2:

    So, should you wish it, you can get an official build and not do my “roll your own” one off path…

  23. CoRev says:

    @Chief, ?Good Taste? or just lucky? Or maybe that’s what the +$50-60 difference buys as well improved performance. I could have gotten ~2 C2s for the price of the XU, but performance and the learning experience were more important to me than just price. Had the PI been working on a new improved version I probably would have waited and bought it. We OLD retired folks have less time to wait.

  24. E.M.Smith says:

    There was a comment in spam that asked why I thought the data was worth saving, given that it is not original data but processed non-data, then had a link to a realclimate? article that was likely why it went to spam. I attempted to get it out of spam (via a tap on the “restore” or “not spam” hotword it the control panel.)

    Unfortunately, in a misguided attempt at cute fancy, wordpress hide that list of three controls (not-spam, history, delete-permanently) until you tap that article box… which can have those choices pop up under your finger sometimes as you don’t know where it is / will be. And “delete-permanently” covers more real estate than the others.

    Bottom line is that comment was deleted against my will as I wanted to touch “not-spam”.

    Please resubmit the comment.

    As to why: some of the data is relatively unprocessed. Called “raw”, it is in fact QA processed, but not homogenized nor infilled, nor UHI adjusted. Relatively clean.

    Then there is benefit in compare and contrast of the present modified data with prior versions of modified data. Think of it as preserving 5% relatively clean data and 95% evidence…

  25. E.M.Smith says:

    I seem to have responded to the prior scrape thread instead of here where the topic came up:

    Short form: Bought another 4 TB disk. Going to make an LVM Group (Logical Volume) with about 8 TB in it and use that for the giant dataset scrape.

    Thanks you to the donors who made this possible.

  26. Don Black says:

    Chief, I now have my hands on the O-xu4, and it is as fast as my 3-4 YO daily driver Windoz system. It is clearly faster than my wife’s older desktop and our ~8YO laptop. So far I am happy with the Ubuntu OS. Still just starting to play.

  27. E.M.Smith says:

    Give us a report or two as things come along.

    Ubuntu is a decent OS. Mostly a stabilized Debian with some polish on it and better bug eradication. I’m still running it on the Orange Pi (web scraper) but with the blue theme lxde desktop (via the Armbian build which is built on top of Ubuntu); and will either drop back to it or vanilla Debian on the C2 until I can do a cleaner Devuan build (fewer issues…)

    My complaints about Ubuntu are more theoretical and political than practical. It tends to be built too fat, so slow on old or small hardware (at least in the PC world). Cannonical corp has been a bit too proprietary on some things than ought to be, in an open source build. Then systemd just offends my 30+ years of SystemV init experience and instincts. I don’t really want to learn a 3rd or 4th way of doing the same things (which does not apply to folks new to Linux) and find it wrong-think compared to The Unix Way (increasingly only of interest to folks who are choosing medicare options…). Running MATE gives a nice desktop, but I change systems often enough that I get tired of the reskinning needed to replace the pervasive green theme with something (anything…) else.

    For pure function, especially for folks from a Windows PC starting point, it is a decent choice. Then 20 years from now you can become a surly curmudgeon connoisseur of Linux releases and init systems and disparage change too ;-)

  28. Don Black says:

    Chief, I believe I am past the surly curmudgeon stage. For some reason my old sigh-in of CoRev did not work, so you see my new ID. I will work with this beast for a while and let you know.

  29. CcoRev says:

    First impressions of the Odroid-xu4. Much faster than the Pi3. Ubuntu Mate is almost as user functional as Windows for some file functions, and then not so friendly. As a periodic user of Linux, I find remembering the needed terminal commands harder when they are needed instead of high level functions. For instance, instead of just pointing to the device and then adding a new folder in file manager, U-mate lets me point then forces a terminal session to do the add.

    Browser is the same, adding hardware is easier than Pi. I’ve seen 200-300 MHz speed on the chip clock, but usually runs in the 140 range.

    Yes, I am using it for this note.

  30. E.M.Smith says:


    It may have worked, just wasn’t white listed yet? It’s an ID/IP/Mystery_Meat mix that fingers a login, then I approve it, and you are white listed until such time as the {something} changes. Even just a new IP from a reboot of your boundary router seems to be able to do it. Eventually things like your login / coffeeshop-wifi and your login / work and your login / home and… all end up on the whitelist and it seems like it never fails… then something changes…

    Nice report on the XU-4. FWIW part of the ‘issue’ with remembering will be that as it is a SystemD Ubuntu now, many of the commands will be something other than what you remember… Part of my issue with SystemD. Would not be so bad had it restricted itself to one function (in the usual Unix Way) but it constantly takes over more turf, so ever more things that you remember end up wrong-again. As I don’t need that in my life, I have generally found ways to avoid is most of the time. (Though at the cost of periodic devo work to get a non-SystemD release running… )

    But generally, yes. That’s what it is like in Linux Land. Lots of nice eye candy and ease of use and nifty free software and then WHAM! you need some arcane command line to get past some stickage point… Well, at least you CAN make it do what you want. Closed systems, like MS Windoz, are often just not possible to make the way you want, period. (And no, I don’t consider reg-edit a good way to go about it…)

Comments are closed.