Zombie Thermometers – Return of The Un-Dead
In looking for what thermometers died in 2010, I discovered that there are Zombie Thermometers. They appear to be alive in some years, but sometimes are unresponsive and give no data. They can be dropped from the GHCN v2.mean data file (silently) as though they have died. Gone and buried. Not even giving a ‘missing data flag’ to show they are alive.
When time passes, these, the Un-Dead, can return to the surface of the planet and mingle their data with the living thermometers…
These are just random samples from the copies of GHCN/v2.mean that I’ve saved over the months. There are many many more (while I’ve not done any measuring of how many, yet, it is pretty easy to surface this kind of record once you know to start looking).
Here is an example of a 3 month Zombie Thermometer. It shows in the “Dead List” right now as 2010 is missing, but it may come back to life 3 months later as new data arives. Notice that in December of 2009, the July and August 2009 data showed up. Then in February 2010 we have September of 2009 show up. (So as of right now, it’s a 4 month Zombie…)
The data format here is that the first character shows what the record looked like on the first date (-) and what it looked like in the last date (+). Then the 12 digit StationID and 4 digit year, followed by 12 months of data with -9999 being the ‘missing data flag’.
Here we have the October copy showing no data for July-Dec. Yet in the December record, we have July and August return to life. In February we have Sept alive again, but no 2010 record and a “Zombie” that died in 2009. Will it return to life in May or June? We will have to wait to see. For now, the v2.mean file shows it has died in 2010. One would expect a ‘missing data flag’ for those records that are vetted to be in the data set but have not reported. But the reality is different. The ‘vetting’ seems to be ersatz or after the fact, at best.
– 1016040300022009 100 99 118 138 199 239-9999-9999-9999-9999-9999-9999
+ 1016040300022009 100 99 118 138 199 239 285 275-9999-9999-9999-9999
Snow-Book:~/Desktop/GHCN chiefio$ grep 1016040300022009 diff_Dec_12Feb
– 1016040300022009 100 99 118 138 199 239 285 275-9999-9999-9999-9999
+ 1016040300022009 100 99 118 138 199 239 285 275 225-9999-9999-9999
The “good news” out of this is that it implies that the file matching brittleness that prevented certain types of ‘delete records’ benchmarks to be run on GIStemp may apply only to the USA data (where it matches USHCN and GHCN data). As time permits, I’ll try deleting something in the Southern Hemisphere to verify that surmise.
This next one looks like a 1 month Zombie Thermometer. In February, we have December data, but not January. Though last December we had Novebmer data, so it isn’t always a Zombie, or maybe it’s a 1/2 month zombie?
Oct to December things are fine:
– 1016042500012009 110 114 147 145 230 279 314 298 242-9999-9999-9999
+ 1016042500012009 110 114 147 145 230 279 314 298 242 217 175-9999
Then December to February 12:
Snow-Book:~/Desktop/GHCN chiefio$ grep 1016042500012009 diff_Dec_12Feb
– 1016042500012009 110 114 147 145 230 279 314 298 242 217 175-9999
+ 1016042500012009 110 114 147 145 230 279 314 298 242 217 175 135
Those records are from Africa (the first stuff in the file) but just so folks know this isn’t an Africa or “3rd world” thing, here are some samples from the Pacific Region and Europe:
Dec to Feb 12:
Here is one where Sept re-appears, and January, but not Nov or Dec. Yet.
So this one looks like a ‘repair’ of September, with a continued “Dropout” of November and December”? Who knows… but at least 5 months after the fact, data can appear or change.
– 5359121200012009 274 277 272 281 285 285 281 279-9999 277-9999-9999
+ 5359121200012009 274 277 272 281 285 285 281 279 278 277-9999-9999
+ 5359121200012010 270-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
In this one, three months stay ‘dropped’ while Dec and Jan show:
– 5099864400032009 276 269 277 285 289 287 282 297-9999-9999-9999-9999
+ 5099864400032009 276 269 277 285 289 287 282 297-9999-9999-9999 272
+ 5099864400032010 266-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Here we get Dec and Jan on one station in the country (what you would expect as normal)
– 6382264100002009 -105 -93 -43 -14 85 135 163 149 119 14 -15-9999
+ 6382264100002009 -105 -93 -43 -14 85 135 163 149 119 14 -15 -100
+ 6382264100002010 -157-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
But November is still left out on another (QA problem? Who knows… maybe it will show up next month…)
– 6382280200062009 -64 -74 -32 19 109 134 166 151 117 23-9999-9999
+ 6382280200062009 -64 -74 -32 19 109 134 166 151 117 23-9999 -85
+ 6382280200062010 -158-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
There does not seem to be any particular limit on how long a thermometer can be a Zombie before the undead can return to wandering the earth. At a minium, it looks like there can easily be 3 to 5 months before data stabilize significantly.
This effect is going to be most misleading in January data. Since it looks like a ‘missing data’ record is not produced for ‘expected thermometer but no data’, rather the record is simply dropped. The thermometer is dead as far as v2.mean is concerned. Even records that are reported as defective in the v2.mean.failed file do not have a missing data flag in v2.mean. They are simply treated as dropped and dead, not even missing. One presumes that “whenever” some data for a new year shows up, then the record for the year is created including the prior months missing “missing data flags”… I can’t see any other way for a station presently ‘dead’ in January to be handled by this process when/ if February data show up.
Oddly enough, one of the excuses given for Bolivia being dropped was that they failed to report by the mid-month cutoff date. Clearly that is not the case. We must look somewhere other than ‘speed of data delivery’ for why some countries and some data are dropped from GHCN. There is evidence here that even reporting a few months late is Just Fine for inclusion in the set.
I need to do some numeric audits on the data file to characterize “how much” of the data is like this. For example, finding the count of 2009 records that have a ‘Dec value only’ with missing data flags for Jan-Nov ought to give an estimate of the rate of thermometers appearing after having been “dropped” all year. It would be interesting to produce a histogram of 2009 showing percent missing all months “cumulative to a given month”. It ought to start high and decay toward zero in December (since, it would seem, if you are missing all months data including December your record will simply not appear in 2009). This can be done as a retrospective study using the more recent data copies.
More interesting would be a ‘month by month’ histogram showing missing months for each monthly report from NCDC. That, however, will require a prospective data set where each month a copy of the data is collected then the 12 months can be compared for rate of ‘change of the past’. Since NCDC published copies are volatile, this can not be done retroactively (unless someone has an archive of the various copies). There could also be interesting things found by looking at daily rate of change within the month(s). While the NCDC ftp server (and everywhere I could find on the web site) do not give a stability date for the file, I would expect the changes to have a pattern. Probably a double peak matching the 2 data circulation days from the Phil Jones email ( 4th and 18th of the month).
The Americas Too?
Oh, and the same things happen in South and North America:
In this one, we have ‘dead thermometer’ for the last 4 months of 2009, then in February, the past is alive again. Except Oct and Nov. For now. So the duration of being a Zombie can wander around at least from 3 months to none.:
Dec to Feb:
– 3178890300002009 77 61 58 16 2 -1 -2 -18-9999-9999-9999-9999
+ 3178890300002009 77 61 58 16 2 -1 -2 -18 15-9999-9999 55
+ 3178890300002010 48-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
For this one, the 2009 record stays unchanged with a missing December, but January shows up:
4017895400032009 260 257 258 266 275 279 280 281 281 282 276-9999
+ 4017895400032010 269-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
While for this one we get July and September back from the dead, but it’s a 4 month Zobmbie, for now still missing in 2010:
– 4027858300042009 239 243 254 275 286 286-9999 282-9999-9999-9999-9999
+ 4027858300042009 239 243 254 275 286 286 285 282 287-9999-9999-9999
I’ve also not characterized how far back in time data values might show up or change. Does the Feb 2010 update change any 2008 or even 1990 data? Don’t know yet. While you would expect the past to stay the same, that does not seem to be the case in “cimate science”…
In accounting, you have fixed “close the books” dates. That, too, seems to not be the case here. The books are always open, always being re-written. Always shifting. Thermometers dead today can be reanimated tomorrow, or even in several months.
There is no audit trail and there is no ‘cut off’ or ‘stability date’. You just get to stick your dipper in the stream and take out what happens to be floating past today. One hopes that it is good enough. The “retroactive QA process” described in the Phil Jones email and the fact that data in these samples a few months old show up or change implies that the last 4 to 6 months of data are not QA checked. This would imply that any GISS map newer than about 5 to 6 months old is based on ‘best guess non-QA checked data that happened to show up lately’ and will change over time. That, further, implies that anomaly maps more recent than ‘a few’ months back ought to carry a disclaimer. But I don’t expect we’ll see one.
Oh, and the fact that Zombie Thermometers with a history of electronic reporting, (even to the point of having 2009 data updated in the 2010 v2.mean file) show as “Dead Today” (and perhaps live tomorrow?): that does show that some of the dropping of thermometers is NOT due to lack of electronic reporting. It is due to a very flakey process, at best.
So “Bolivia dropped for late reporting” and “dropped for not reporting electronically” are insufficient excuses for explaining thermometer drops. We do have the potential for ‘dropped without a clue’, dropped after decision, and dropped because local BOM quit reporting (though for the ones in Wunderground, that’s a hard excuse to make stick). At this point, the process looks to be so ersatz, uncoordinated, and lacking in controls that anything could be done to the data. At a minimum we have a Zombie Walk of Thermometers to deal with.