Sometimes when writing code you make assumptions. Quite valid assumptions. That turn out to be quite wrong.
Sometimes it is not your fault.
Sometimes the data sucks.
I’ve ported my dT/dt code to run on both the v1 and v3 versions of GHCN.
It does a ‘first differences’ anomaly processing on EACH thermometer record before doing ANY combining, so is about as pure and clean an ‘anomaly’ process as you can get. The only real ‘twist’ to it is that I do the ‘anomaly’ creation process starting with the most recent data point and going backwards in time. The most recent data ought to be the best, so this puts any extreme excursion for very old and questionable instruments or processes ‘at the start of time’ for that particular thermometer. In this way, a thermometer at a place that was first being read in 1720 and perhaps even in some entirely different scale, like Reaumur, does not cause all of future readings to be offset by whatever oddity it might have had in the first reading.
First Differences simply takes the first reading you have, and subtracts the next one from it. So if you have 20.1 today and 19.1 is the next reading the difference would be -1 C. No Problem.
I create the report in several steps. This is often an easier way to program and it is frequently very useful to have intermediate data files for things like debugging or for feeding the intermediate form into a another ‘great idea’ that comes along (without needing to reprocess all those first steps). In particular, I combined the Inventory File with the Mean Temperatures data to make a combined file where each record is identified. (This is how it really ought to be, since the data in the inventory file changes over time in the real world, but only the most recent point in time is captured in the INV file. A land use, for example, may be AIRSTATION today, but I assure you it was not so in 1800…) Then I create a version of that file where all the temperatures have been put through the dT “first differences” processing. At that point the data can be used to make all sorts of interesting reports (and more easily since the meta data are attached to each record).
Why That Matters
Here I was making dT/dt reports by country and by region, comparing v1 with v3, and looking for patterns. I went to do one for “North America” (via using “country code” of “4” – yes, just “4”. I search on the string and the first digit of country code is the continent area). My program “blows up” with a run time “data type error” on reading input records. It is trying to read character data into an integer variable and that is forbidden in FORTRAN. ( “C” will let you do it, though ;-) In this case the “not letting you do it” is a feature for FORTRAN.)
Now I’m worried. Have I “blown the port”? Is my “FORMAT” statement “off by one”? A common error is to have numbers and letters near each other and to get your ‘framing’ off in matching the “READ” format to the actual data. So you might have “1995LUFKIN 20.1” and be trying to read it as “995L” and UFKIN by being ‘off by one’ and looking just one character too far to the right. In some cases that kind of bug will run FINE on 99%+ of the input data, yet fail when one extreme valid value comes along. So, for exapmle, “20.1” being turned into “0.1 ” might never be noticed. Make a temperature of, say, “22 C” into one of “122 C” and a human being might notice, but a computer program only notices if you told it to check. To do “QC” or “QA” for out of range data.
In my program, I had done no ‘range sanity checking’. It is a common choice for programmers. “Bounds Check” the input data (to catch obviously broken cases of “insane” data) or not? Is the party or program handing you data one that can be “trusted”? Has it already done the “sanity checking” and the data are guaranteed to be “in bounds”? One of the very first things I learned in my FORTRAN IV class many decades back. (To this day I marvel at how much ‘that really matters’ about programming I learned in that one class. Problem sets cleverly designed to force you to run into things like out of bounds data and ‘the typical problems’ with the typical bugs.) The question became: “Was it something I did?”.
Was my program “broken” in some subtle way?
So off I went debugging.
It wasn’t about me…
I did discover in one of my intermediate files an “anomaly” that was all asterisks. FORTRAN does that for you when you tell it to print a number into a field that is too small for it. ( “C” will just let you do it. Sometimes it’s a feature to say “just do it” and “C” is a better language. For engineering work, the behaviours of FORTRAN, where it “barfs” on things that are probably an error, helps to discover “boo boos” better. There are times I Like FORTRAN better. This is one of them.)
So WHY did that field have asterisks? Were MY numbers off? Did I have an ‘off by one’ on the size of the numbers and overflowed the size of the field? All the other numbers looked about right.
Swimming further up stream, I found that the record in question from v3 Mean temperature file was in error.
Inside GHCN v3
Cutting to the chase… There were 3 records for North America that had “crazy hot” temperatures in them. They fit in the 5 space long field in the v3.mean type file (that is in 1/100 C without the decimal). The field might say “-5932″ for a negative -59.32 C reading in Antarctica, for example. One would expect values below about ” 5000″ for most of the world. (This is where range checking can be fun… just what IS the highest temperature ever, and how much ‘head room’ do you leave above it for that new record to show up? Knowing that it may let SOME errors come through undetected…)
In doing my “create the anomaly values” step, a Very Large Positive can become a Very Large Negative after the subtraction. Then there may be no room for the minus sign in the 5 digit space. And you get asterisks. And your report “blows up”…
That is exactly what happened.
Looking at the v3.mean data, there are 3 records for North America with “insane” values. Simply not possible.
Yet they made it through whatever passes of Quality Control at NCDC on the “unadjusted” v3 data set. They each have a “1” in that first data field. Yes, each of them says that it was more than boiling hot. In one case, about 144 C.
You would think they might have noticed.
Here are the records, as extracted from the ghchm.tavg.v126.96.36.19950511.qcu.dat file. Yes, that is the “unadjusted” file. But one might have thought that “insane” values would not be included. I’ve yet to check the .qca.dat “adjusted” one to see if they are removed from it. It would be a heck of a “Hobson’s Choice” to be stuck with either accepting ALL of their “adjustments” or having “insane values”; but that looks like it may be the case. (Welcome to ‘raw’ data…) So it looks like I’ll be needing to add a step of “compare qca to qcu” to see how much changes.
This record is from CHILDRESS. Notice that the seventh temperature field (near the middle of the record so scroll just a touch to the right) is 13810. That’s 138.10 C. The data then go to “missing data flags” of -9999 from that point forward. I suspect it was a ‘keying error’ and the value was supposed to be 13.81 but got shifted ‘off by one’, but it could just as easily be that the sensor simply went nuts. BTW, this is part of the QA process that was lost when we went to automated thermometers instead of having people read them. A person would say “120 F in winter? No way” and go get a new thermometer. Automated systems typically don’t know winter from summer or that it FEELS like it’s about 60 F today so I ought to suspect an error in that 80 F reading… If a sensor goes “a little bit bad”, the data will be blindly accepted. Heck, even a whole lot of “crazy bad” looks to be let through.
425723520021996TAVG-9999 890 G 890 G 1700 G 2530 G 2700 G13810 OG-9999 -9999 -9999 -9999 -9999
This is the DALLAS/FAA record:
425722590021996TAVG 750 0 1250 0 1280 0 1870 0 2700 0 2860 015440 O0-9999 -9999 -9999 -9999 -9999
Again we notice that it’s a 1996 record (the first 3 digits are the Country Code – 425 for the USA, then 8 digits of WMO and instrument identifier, then 4 digits of YEAR). Again it is the 7th temperature field (or July) that is in error. 154.40 C in Dallas. Who knew? And again we go to “missing data flags” the rest of the year.
So just how trustworthy are the May and June values? Did the sensor cleanly and suddenly die just in July? Or has it been on a long slow drift to incorrect high readings for a year (or since the last calibration)? How many bogus high values are accepted as “close enough”? Are these sensors prone to “fail high” readings? Or is there a random distribution of “fail high” and “fail low”? I suspect that is a question for folks like Anthony Watts, who knows more about temperature stations than anyone else on the planet, near as I can tell.
At any rate, it might be interesting to compare these stations to “nearby” stations for the several months or years prior to these “data farts” to see if the offset stays contsant into a catastrophic failure, or if there is a long slow drift that is accepted into the record, until the “blow up” happens. Clearly if 154 C makes in it, 48 C would too… and even 28 C when 27 C was the actual temperature.
This one is from LUFKIN:
425747500011996TAVG 950 G 1310 G 1310 G 1920 G 2640 G 2670 G14420 OG-9999 -9999 -9999 -9999 -9999
Again 1996 and July, followed by missing data flags. 144.2 C.
In another posting comment in this article, DocMartyn found that the temperature “ramp up” matches the onset of electronic thermometers and the use of short RS-232 ( or RS-2s2) connector cables. The typical assumption has been that it was pulling the stations closer to buildings and power sources, but might there also be a ‘cumulative failure mode’ impact over time as well?
In earlier work, I’d found that the bulk of the “warming” came from the most recent “Mod Flag” and speculated that something about the processing of the newer data was suspect. There was also the early failure mode of some of the instruments where they would “suck their own exhaust” and pull in hot air from their humidity testing heaters. To that we can now add some suspect data from a non-graceful failure mode.
Basically, one simply must ask: “Just how good, or bad, is the quality checking on these electronic gizmos?”
The same records, extracted from my “combined with Inventory information” file, so you can get more information about them (though you will need to scroll a LOT to the right… it’s a long record ;-)
425722590021996TAVG 750 0 1250 0 1280 0 1870 0 2700 0 2860 015440 O0-9999 -9999 -9999 -9999 -9999 32.8500 -96.8500 134.0 DALLAS/FAA AP 148U 4037FLxxno-9A 1WARM CROPS C 317890c317890 425723520021996TAVG-9999 890 G 890 G 1700 G 2530 G 2700 G13810 OG-9999 -9999 -9999 -9999 -9999 34.4300 -100.2800 594.0 CHILDRESS/FCWOS AP 565R -9FLDEno-9A-9WARM CROPS B 343396c343396 425747500011996TAVG 950 G 1310 G 1310 G 1920 G 2640 G 2670 G14420 OG-9999 -9999 -9999 -9999 -9999 31.2300 -94.7500 85.0 LUFKIN/FAA AIRPORT 70S 30HIxxno-9A10WARM DECIDUOUS B
So that’s where I got to in last night’s “Coding Frenzy”.
I was planning to post up some v1 vs v3 comparison reports (which have been run), then ran into this “bug” and spent until 3 am chasing phantoms only to find that “It isn’t about me” and it was crappy input data.
Yet that discovery points to a very interesting potential issue. IFF 154 C can just flow through until whatever “Magic Sauce” is applied at NCDC to remove it in “QA” processing: We are 100% dependent on their “QA” process to catch such errors and to catch more subtle errors that do not fall into a “sanity check” bucket.
How many thermometers might read 1 C high for a year? Or 0.4 C high for two years? And never be ‘outed’ by the “QA” code?
We just don’t know.
But we do have a very suspicious “onset” of the ramp in warming right at the time the electronic systems are rolled out (found by two folks using entirely different methods) and long after CO2 had been increasing for decades.
IMHO this is more than enough of an “issue” to put some Liquid In Glass thermometers in selected locations to “Guard the Guardians”… ( from Quis custodiet ipsos custodes? )
We are, in essence, fully dependent on some rough sieve computer programs checking some occasionally insane automated data entries to determine if there is Global Warming, or not. I find that inadequate.