Well That Was Fun, sort of…

OK, a small update on computer status.

The R.PiM2 is now doing a wonderful job of site scraping, having accumulated about 150 GB so far. It is still running against the two first sites, and then I have two OTHER sites I want to scrape for temperature data as well. I think I’m gonna need another disk…

So sometime in the next week or three I’ll stop off at Best Buy or Walmart or Costco or… and get another TB disk for about $60, format it EXT4 or similar, and call it a dedicated temperature data archive.

Along the way several small adventures happened (already posted). But a couple of things not posted yet. One is ‘sort of amusing’… In the middle of all the data shuffle and trying to wedge an EXT3 file system into a file on NTFS (so as to avoid the lousy way permissions and owner are handled with NTFS – you map them by hand…via creating a mapping table from NTFS to original Linux / Unix owners…) and running head long into the ntfs-3g driver behaving badly with small block sizes and sparse files: I had all USB ports full with mouse, keyboard, uplink to USB hub, USB memory stick, and I had the powered USB hub driving 4 different USB disks.

A total of somewhere over 2.6 TB on 4 spindles. Yes, stuff was scattered around in different places and ought to have been pulled together into smaller spaces. But free space is where you find it as you cope with “issues” at 2 AM… Well, the USB disks tend to ‘idle down’ and sleep if they are not being accessed. So everything was going fine. Then I started launching more and more things. I got up to about 99% CPU usage (so all 4 cores using power) and all 4 spindles rotating and all heads seeking… And the little rainbow colored ‘low volts’ indicator started to blink on every minute or so. It was bad enough that it caused the R.Pi to detect a ‘plugging event’ on the USB drives and scan for partitions.

Since I had some unmounted FAT32 partitions on the disk, it would pop up the ‘partition found’ menu and ask me what to do with them. That was my clue that things were ‘not well’. Occasionally, but not consistently, I’d had wget downloads ‘hang’ with various disk write errors. File system not mounted, or not write enabled, or some such. Usually when I had been out of the room. The power low blink and file mount request finally showed up and that clue was the Ah Ha moment to realize that when all 4 disks were running, sometimes a ‘spin up’ or synchronized head seek and spin-up would exceed the power available from the powered hub. It would then draw a bit extra from the uplink, and would sag the power to the R.Pi.

What was surprising was that the R.Pi kept on running and didn’t crash, but disk mounts would become 1/2 unmounted. Not writable, but still showing mounted in the df listing. ( I think this was a side effect of the ‘plugging event detection’ software ).

So I stopped a couple of processes and, since I had gotten the disk issue cleaned up ( i.e. working on a non-sparse loop mounted file with big_writes ) I consolidated some of the activity. I no longer needed the tar.gz archive for the restore into that partition, so that disk could go offline. I now did have swap on one of the Toshiba drives (that one that took 3 days to resize the NTFS partition) so the WD could go offline. Then a short ‘disk check’ via fsck and reboot…

Now everything is running fine and stable. Both downloads are smoothly downloading (and the restart will have done a quick check that things match on both ends) and I’ve got 150 GB or so of more disk space for them spread between the two disks.

During most of this time, at least one of the downloads has been running to some disk or other, so the “futzing around” hasn’t stopped the process. But it has been interesting.

The major “lesson learned” here is that a “powered USB hub” can still suck power over the USB link to the R.Pi if you load it up with enough power hungry things (and hard disks suck power fairly strongly – and in surges at start up). This also implies it can supply a little too if it has extra, stabilizing a marginal power supply on the R.Pi itself.

So don’t think of a powered USB hub as a power isolation / protection device. It sometimes isn’t.

OK, enough on power supplies.

I’m now, more or less, back to normal. I’ve got the R.PiM2 busy on a task, and it will be 2 weeks before it can need more disk space; given my link speed. I hope by then these 2 downloads are all done.

I have my main desktop machine back where I can use it for postings (no longer doing disk re-sizes that take forever…) and things are getting back to normal.

At this point, I’d ‘size and price out’ a data scraper system as being about $150. One Raspberry Pi Model 2 kit for $60, a large USB disk for about $60, and about $30 of powered USB hub and misc cables. Then do the basic OS install, download the free software. With that, and the wget commands from the prior posting, you can effectively keep a mirrored copy of about 1 TB of climate and weather data from the sites of your choice. Get a 2 TB disk for about $30 more if you need it and you are still under $200. A Raspberry Pi B+ ought to also be fine for this use, so you could shave another $10 or so off the cost, and leave out the case and such you could likely get it down another $10, but frankly, it’s already cheap enough to be in the ‘noise’.

Once the downloads are done I’ll post some sizing information based on actual data sizes, and put up a more organized “How To” on a DIY data scraper with links back to the R.Pi hardware set up, the wget command use, and formatting the file system more rationally. But for now, you have the rough “how it was done” to work from for anyone wishing to “play along at home”. And with that, the Temperature Data Archive Station is up and running.

FWIW, the CDIAC site has the GHCN V1 data in one of their dataset archives, so even though NOAA has ‘disappeared’ it (and V2) they can still be found. I have both and will be preserving them in a permanent archive for anyone who can’t find them online (availability some future date when no online copies are available). I’ve also got a copy or two of the reputed “raw daily data” though I’ve not characterized them as to just what is what (and they are large…)

So “sometime” after this process has run to completion I’m going to be making a catalog of what temperature data is in this pile. (Yes, I know, it would be far more efficient to have made that search / catalog first, then only done the download on the parts that were temperatures; but where’s the fun in that? ;-) Realistically, it’s much much faster to search on a real set of files on your own machine with full Linux / Unix tools than to do it on an FTP site, and some times you end up downloading a bag-o-bits anyway just to see what is in it (the online size and content information is, er, sparse…) so I decided to just “let a machine do the grabbing” and I’ll sort it out later.

And, with that, I’m off to a cup of morning coffee and a think about “What next?”

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues, Tech Bits and tagged , , , , , , , . Bookmark the permalink.

9 Responses to Well That Was Fun, sort of…

  1. Richard Ilfeld says:

    Having taken stuff apart since childhood, I empathize with your struggles. But for the higher purpose of extracting a competitive truth from the temperature records, I can’t help but think that a really decent off lease Dell setup can be had for $150 bucks; mature Linux available.

    So I conclude that the “hassle” you are suffering through is fun and rewarding on its own terms.

  2. E.M.Smith says:

    @Richard Ilfeld:

    Well, yes. From a comment I made at the bottom of the “loop file system” posting:
    “Oh well… Good thing I enjoy this kind of stuff ;-)”

    BTW, I’ve several times mentioned that I bought a $70 or so 64 bit box (ASUS / Antek) from Weird Stuff and that they had some equally cheap boxes with Linux already installed ( I chose this one for the Windows XP as backup in case I could not recover the Evo Win XP ) so I’m well aware of the “reuse prices”. But the goals here were not just to get a decent system out of scrounging. Frankly, most folks can pick up an old PC for “free” as their friends upgrade to the next god-awful fat Microsoft release and their old system is “too slow for anything”…

    So Why The Pi?

    1) Power consumption is near nil. Something like 5 W. You can leave it on forever doing real work and not care. Compare the PC at 150 W to 200W and sometimes more.

    2) The R.Pi is silent. I can leave it running on my desktop for weeks on end and not have it whirring at me at 2 AM when I’m trying to have some quiet problem solving time.

    3) The R.Pi. takes up the space of a paper back book on said desktop (but could just as easily be put on the windowsill if I wanted the space…) Far less intrusive than a desktop PC.

    4) The R.Pi. has no moving parts. PCs have fans. Fans have bearings that wear out and vents that clog up with “dust bunnies”. Maintenance of the Pi is near zero.

    5) I wanted a “bought new” solution example so folks who wanted to make a temperature archive station could just hit Amazon and be done. Not everyone has the skill, interest, or desire to go dumpster diving or walk the isles of the Recycle R Us store. Besides, for many folks (including me when I’m on a contract) the time to do that costs more than the $60 on the credit card to Amazon. So this is a solution anyone can do.

    6) As the R.Pi is often used in teaching circumstances, and as I have a teaching credential in data processing at the Community College level, I thought this could also serve as a model project for anyone doing such things. Those work better on the Pi than when they start with “get your parents to go with you to the bad side of town and dig through the recycle pile looking for, well, you know, er, maybe you don’t…” and when done, one can use the R.Pi for a load of other things too if desired.

    7) I already owned the sucker. Bought it for other things, but it’s just sitting there asking to be used for everything.

    8) It’s just so darned cute! ;-)

    9) In a ‘raid’, it isn’t going to be something that most police will think they need to take. (Especially if it is hidden in a shoebox in the upstairs closet not making any noise and using WiFi for the data connection…). Since the post-Climategate raid on Tallbloke, that sort of thing matters. Yes, much more of my “stuff” is not where it can be found, or where if it is “found” it isn’t understood. ( I’m not spilling ALL the beans here… I’m giving examples of things folks can do, not a road map to what all I do ;-) For example, I have No Upstairs…) So you can do things like put one of these on a battery in a box in a shed out back and still have it running. And, guess what, when a warrant says ‘take computers’, most cops don’t think that means “cut open plastic gas can in shed out back”.

    10) IFF it does die, or get raided, it is trivial to replace. $60 to Amazon, next day assemble and start software config, day after that, download encrypted binary backup blob from “The Cloud Server In Romania” and restart it…

    11) It doesn’t involve Microsoft in any way, nor does it involve UEFI bootloaders.

    12) It doesn’t involved Google in any way, nor does it involve Chrome and their spyware.

    13) The Raspberry Pi is not part of the PRISM program (sorry NSA guy…)

    14) ALL computers have a “hassle factor”. This one is no worse than most, better than some. For example, the “disk full shuffle bits” problems would be identical whatever Linux I plugged them into. Similarly, I’ve had power supply issues on desktop machines more than most other modes of failure (often the fan dying… sometimes dust bunnies and greasy dirt clogging like at one client where their main build BSD box had been running for a year non stop in a closet…) So no more trouble than any other. BTW “mature Linux” is an interesting marketing idea… but for ALL new hardware, the port is young… If you want “mature Linux” you get old hardware. I’ve had worse problems with the HP Laptop and Linux driver issues. So much so that I gave up and didn’t get Linux installed on it before it started to die…. from (drumroll please…) fan failure issues… (It takes a bizarre video driver IIRC and the disk partitioning sucks so you must dump, format, restore and THAT’s a pain and…)

    15) At heart, I’m most motivated by being efficient and enjoy the minimalist ethos. So I like to find the minimal solution that works the best for the least cost. The R.Pi is a great fit for that kind of ethos. (Oddly, for this particular project, the Model 2 is likely a little bit of overkill. Mostly I’d get 2 cores busy, rarely 3. One of them on the Xorg video and the other on some big data move sorting out how to use the disks I had. So run ‘headless’ and with a dedicated TB disk, it won’t even keep one core busy… and a R.Pi B+ would be plenty.) I expect to use several more of them “going forward” in that minimalist way. The end goal being an entire Data Processing “shop” built on minimal parts with minimalist design. (Eventually all “data” being “out there somewhere” and only delivered and decrypted locally as needed so all systems are generic and all data is invisible when not on the screen… so take it all and you get nothing, and I’m back up and running next day. A different kind of minimal…)

    16) If I find “the issues” it means someone else doesn’t need to. Frankly, that’s largely been my job for most of my career. Fix issues other folks ran into, or find ways to prevent others from running into them. So it’s natural for me to go looking “for issues” and to enjoy finding them. As the R.Pi is “new to me”, I like finding the limits of what you can do with it so that others need not. Like doing a series of shots at a bullet resistant vest to rate it as a I or a II or… it’s a good idea for folks who do this stuff for a living to push it to the limit before you try it at home… It means I know what it can do well, and when someone says “Do this and use a R.Pi” I know when to say “Sure!” and when to say “Um, not so fast. That will blow up in this way for that reason.” The Model 2 has a much wider set of acceptable applications than the Model 1. I like that… In a Haiku kind of way…

    So I wouldn’t call it “struggles” so much as “constructive play”. Besides, I figured anyone could figure out that wget works on a regular old PC under Ubuntu and “go there”… so why bother? I’m not “just anyone” ;-)

    Hopefully that clarifies my POV

  3. p.g.sharrow says:

    “The end goal being an entire Data Processing “shop” built on minimal parts with minimalist design. (Eventually all “data” being “out there somewhere” and only delivered and decrypted locally as needed so all systems are generic and all data is invisible when not on the screen… so take it all and you get nothing, and I’m back up and running next day. A different kind of minimal…) ”

    An excellent summation of of the concept behind this project.:-) creating tools that others might utilize has it’s own reward. Maybe some might even contribute to the effort. Priming the pump is necessary and the smith is forging the tools! …… I’m impressed;-) ..pg

  4. E.M.Smith says:

    @P.G.:

    I remember my Dad saying that a Smith always travels with an anvil and a hammer, which which you can make the forge and all the other tools… but in a pinch, you can cast a rough first hammer and anvil from metals like bronze to start the process… or, I’d add, even use wood, rocks and sticks if need be. Then using bronze adz, axe, hammer make the forge to heat the iron to make the next round. (But a LOT easier if you just bring a properly cast and hardened anvil and hammer with you). I think that was the first introduction I had to the idea of ‘bootstrap’ and ‘first you make the tools to make the tools’… My first smithing was to make a nail. (It’s traditional… first you make a nail as it is pretty simple and small and fairly hard to screw up). Then I made a screwdriver (basically a really big nail and you flatten one end just so… and heat treat). In “the middle of nowhere” I’ve used a couple of rocks to shape and sand car parts that need a bit of help to get me home… (a rotor can be nicely cleaned and polished on a chunk of sidewalk or curb if need arises…)

    Never did make it to the point of making tongs and pliers. Got sucked off into electronics instead ;-) but Dad told me the process (making the hinge is interesting and the jaws grip grooves / hardening is interesting…. making a file is tricky. Has to be soft to cut the teeth, then hardened like crazy… but not to the core so it doesn’t break from brittle… Fast hot heat and quench, then pull and let the heat equalize… but not so much as to anneal the teeth… timing was a big thing…)

    IIRC it was a heavy oil quench and a nitrate hardening ( i.e. horse piss if available makes a nitrated hardened surface. Lacking a horse, drink some beer and wait ;-)

    Maybe I ought to make a forge and see if I can still make anything ;-)

    But yes, first you make the tools… That was the unboxing / assembly / first SW install.

    Then you make the nails and screwdrivers and pliers… install the rest of the SW and wget and…

    Then you make the horseshoes, shovels and hinges folks want … that Temp Archiver Package, or the encrypting file store in a file system in an encrypted disk on a remote “personal cloud” server with a VPN tunnel to “you” from “wherever”… and assign the kid to making the 10,000 nails folks also want as he learns to hit square and uniformly on the workpiece ;-)

    The more things change the more they stay the same…

    So, as of now, I’ve got a ‘good enough’ workstation on the R.PiM2 (though a Cubieboard truck with the faster single cores would be more comfortable), I’ve got GIStemp to compile on it. I’ve got a datafetch / archiver working fine. I’ve got a good idea how to do the file hiding / encryption (maybe that ought to be up next) and I’m most of the way into the ‘back room services’ of web server, DNS server, file server, bittorent server, DHCP server, etc. etc. with most of those running, just not a final full write up and integration. Pending boot server / PXE boot who’s target storage space has just been sucked up by temperature data ;-) Then there is that hard nut of a Tails on Pi or RatTails that’s barely started… but is started, if moving slowly as the infrastructure gets built and catches up… The locked down security process layer being, typically, the last thing folks think about and do…

    I’d guess I’m about 1/2 way through the set up and config process for the basics.

    Any suggestions for desired stuff / features, just post a comment.

  5. Larry Ledwick says:

    Maybe I ought to make a forge and see if I can still make anything ;-)

    Go to junk yard pick up big industrial truck brake drum, mix up a batch of plaster of paris 50/50mix with sand and line it (poor boy fire brick lining). Let it dry then bake it to get it absolutely dry, Use hub hole to provide air from the bellows (which you have to make too).

    In Metal shop we were required to forge a cold steel chisel and temper it so it would snip off 1/8 welding rod about 20 times without damaging the edge. My final project was a knife made from a file. Shaped with a grinder (stock removal method rather than forging) then tempered (bake in oven at 475 until it just starts to turn straw color, quench, then heat the tang with the blade wrapped in a wet cloth until the blue line of oxide ran up to the hilt (so it won’t break at the hilt)

    I could sharpen it razor sharp (shave the hair on your arm with only spit as a lubricant), cut a piece of typing paper clean with a single fast swipe (watch your fingers). The shop teacher then (much to my dismay) snipped off a couple pieces of 1/8 inch welding rod with it, by using it as a cold steel chisel with a brass hammer on a lead block. No damage to the edge much to my relief. I still have it. Old files make nifty knives hold a really sharp edge much better than modern stainless alloys. If I needed a post disaster knife I would start by looking for an old file.

  6. LG says:

    @ ChiefIO:
    Please, tell us that you have snapped pictures of that glorious sprawl in its native wildness…. :D

  7. E.M.Smith says:

    @Larry:

    Ah, yes… the chisel project… Another of those basic things to make… Never did make a knife, but got to practice my hardening and quenching on an old one. South American natives would sink the blade of a machete into a melon half of the same profile (from the inside side) and that would leave the body of the blade supple but harden the blade edge, and with just enough residual heat leak back into the edge so it didn’t chip easily. I suspect some nitriding from the melon as well.

    Nice idea for a simple quick forge. I’d figured on a “pile of bricks” forge as the fireplace lining will be “available” after a major quake 8-0

    @L.G.:

    Which “glorious sprawl” did you have in mind? If the Pi and octopod/2 of disks, well, I can do that…

  8. LG says:

    Yes!
    That “Inglorious Sprawl”

  9. pg sharrow says:

    Hummmm.. Grandpa gave me an anvil and hammer as well as a small machine lathe. Claimed with those I should be able to recreate most anything. I might need on the farm. Never had a forge but do have a Victor wrench and a buzz box ;-). Oh! yes and a junk pile that Gyro Gearloose would envy.
    A few years back I needed a good brush knife. The offerings at the equipment stores were very disappointing, so I was digging through the junk pile looking for a suitable piece of steel, finding nothing.
    Then as I was imagining the needed knife it dawned on me! An old used chain saw blade! 25″ long, Heavy 3/16″ Tough high alloy spring steel. Wow! cuts a 2″ limb with 1 swing, won’t bend or break and holds an edge even after encountering rocks. Didn’t need to forge or temper. Just ground to shape and pin in handle. Very high quality alloy steel is a joy to work with. pg

Comments are closed.