So, how hard is it to compare them?
I have been waiting for the GHCN Version THREE set (that was supposed to be released in about February) and, well, I’ve gotten tired of waiting. You can only ‘sharpen your knife’ so long before you want to start chopping on something…
So I’ve started taking a few practice swings at GHCN Version ONE. (The present set is GHCN V2).
What I’ve learned so far is, while not surprising, very disappointing. How can folks be so sloppy?
What kind of sloppy?
Well, for starters, the “key fields” are not stable. Anyone with any decent data processing experience would have a distinct key field that is kept unique to the exact instrument/location set and would not change the keys from version to version. These keys would point to the unique meta data (such as LAT / LONG and NAME) and when the data are updated (say for name) a decision would be made to keep the name constant (if a gratuitous change – i.e. an ‘error’ like turning N.W.T. into NWT) or issue a new record and key if a significant change (like if the LAT / LONG / ALT show a real station relocation event).
Look, I can understand having the need to create new country codes as countries come into being, and even the need to split an old country into new ones. But they changed them all. I can see no sane reason to do that. There is no reason what so ever for the USA, France, UK, South Africa, Brazil, etc. to have had the country code number change. But it does.
The major station ID number (5 digits) does stay stable, but the ‘minor number’ field changes from 2 digits in V1 to 3 digits in V2 (and no, it’s not just a mater of taking 01 and 02 and making them 001 and 002; the actual order of assignment is different as well). So most of the time the number is a good indication, but sometimes it isn’t. xxxxx001 and xxxxx0002 may have been xxxxx02 and xxxxx03 in the past ( I’ve seen examples of that).
Given that many stations have name changes, there are a fair number of ambiguous cases where it’s just not clear which station is what.
Even the LAT and LONG often drift by a few 1/100 or even 1/10 of degree (some more than 1/2 degree!). Is that ‘error band’ or is that ‘station move’? Who knows….
Sample Data for Inventory File
So here is a sample of my combined inventory file. I’m merging the V1 and V2 inventory files to make a map of “old index number” to “new index number”.
+40371928000 ROCKY MTN HOU 52.43 -114.92 988 975R -9HIxxno-9A-9COOL CONIFER B +40371928001 PRAIRIE CREEK RS,AL 52.25 -115.30 1173 1253R -9HIxxno-9x-9COOL CONIFER A ?40371928002 NORDEGG,AL 52.47 -116.08 1402 1480R -9MVxxno-9x-9COOL CONIFER A ?40371928003 NORDEGG RS,AL 52.50 -116.05 1320 1518R -9MVxxno-9x-9COOL CONIFER A +40371928004 SPRINGDALE,AL 52.80 -114.30 914 954R -9HIxxno-9x-9COOL CONIFER A ?4027193000 WHITECOURT,ALTA. 54.15 -115.78 783 1981 1990 1.7 0 ?40371930000 WHITECOURT,AL 54.15 -115.78 782 753R -9HIxxno-9A-9COOL CONIFER B ?40371930003 WHITECOURT,AL 54.13 -115.67 741 732R -9HIxxno-9x-9COOL CONIFER C +40371930001 HELDAR,AL 54.02 -115.00 701 690R -9HIxxno-9x-9COOL CONIFER A +40371930002 KAYBOB 3,AL 54.12 -116.63 1003 897R -9HIxxno-9A-9COOL CONIFER C +40371930004 CAMPSIE,AL 54.13 -114.68 671 668R -9FLxxno-9x-9COOL CONIFER A +40371931001 ANDREW,AL 54.02 -112.23 610 606R -9HIxxno-9x-9BOGS, BOG WOODS A +40371931002 ST LINA,AL 54.30 -111.45 632 627R -9HIxxno-9x-9BOGS, BOG WOODS A +40371931003 MEANOOK,AL 54.62 -113.35 684 632R -9HIxxno-9x-9COOL CONIFER A +40371931004 ATHABASCA LANDING,AL 54.72 -113.28 503 569R -9HIxxno-9x-9COOL CONIFER C +40371931005 CALLING LAKE RS,AL 55.25 -113.18 598 622R -9HIxxLA-9x-9COOL CONIFER B =4027193200 FORT MCMURRAY,ALTA. 56.65 -111.22 369 1931 1990 0.6 0 =40371932000 FORT MCMURRAY 56.65 -111.22 369 360R -9HIxxno-9A-9SOUTH. TAIGA B +40371932001 TAR ISLAND,AL 56.98 -111.45 240 287R -9HIxxno-9x-9SOUTH. TAIGA B =4027193300 FORT CHIPEWYAN,ALTA. 58.77 -111.12 232 1883 1988 34.0 0 =40371933000 FORT CHIPEWYA 58.77 -111.12 232 245R -9FLxxLA-9A-9SOUTH. TAIGA A =4027193400 FORT SMITH,N.W.T. 60.02 -111.95 203 1914 1990 2.1 0 =40371934000 FORT SMITH 60.00 -112.00 203 192R -9FLxxno-9A-9SOUTH. TAIGA A =40371934002 FORT SMITH,N. 60.02 -111.95 205 196R -9FLxxno-9A-9SOUTH. TAIGA B ?4027193500 HAY RIVER,N.W.T. 60.83 -115.78 166 1893 1990 4.5 0 ?40371935000 HAY RIVER,N.W 60.83 -115.78 166 173R -9FLMAno-9x-9SOUTH. TAIGA C
This is for a chunk of Canada. I’ve added a leading symbol (edited in by hand) where “-” means the old V1 station is dropped in V2. “+” means the V2 station is a new one. A “?” means that there is some doubt and I need to do a look at the actual data in more depth to sort things out, while an “=” means I think this is the same station.
Notice that Canada has changed from 402 country code in V1 to 403 country code in V2. For that last line, you can see that HAY RIVER ends with either N.W.T. or N.W (so an automated name match will have issues – some are even worse) while FORT SMITH, N.W.T. is in one LAT LONG while the new ones are somewhat different (automation via LAT LONG will have issues…)
Then for WHITECOURT, we have 2 potential replacements, but with different minor numbers and in slightly different places. Station move or?… So automated number matches will have issues.
So I’m slogging through the (approximately 13,000) records by hand doing an intelligent match. But it’s slow. Thus the lack of postings the last couple of days.
My Stats So Far
This is a rough count of where I’ve gotten so far:
13320 Total Records Done 7786 5534 Not yet Done 58% 4592 Equal Sites 19% 1532 Added Sites 15% 1183 Dropped Sites 6% 474 Questionable Matches 34% Changed Percent 40% Changed Plus Questionable 65% Equal Sites Plus Questionable
The most immediate thought that comes to mind is how anyone can think they can do global calorimetry to 2 decimal places with 1/3 of the instruments changed every decade.
The second thought is based on observing the data as I’m doing the edits. Canada and Russia have massive change. And that’s where GISS “finds red”. Places with little change of thermometers have little red (Africa, some of south Asia). There are many countries where the Percent Change is near zero, then places like Canada where it’s well over 50% (exact stats when I’m done with flagging North America… So far I’ve done Europe, Asia, Africa, and about 1/2 of North America).
You can learn a lot about National Character doing this kind of thing. The Arab countries and the Chinese have very little station change. LAT / LONG are remarkably consistent. There is the odd change of an airport name from a geography name to a politicians name, but not much else. Then there is Yugoslavia where all sorts of things change. Canada that is about as stable as a schizophrenic on meth with changes happening darned near everywhere in huge percentages. Even for the “stable” places, the LAT and LONG change by small amounts. Did they change instruments? Go from a Stevenson Screen out in the snow to an MMTS on an electric leash to a nice warm building? Most likely. Many stations show a small nudge to the LAT or LONG in the middle of the record (as seen in the above examples). But I’m certain I’d not put much trust in the Canadian record without a whole lot of detailed Station Audit done. Then there is Russia. Along with the dropouts, there is an odd set of LAT/ LONG changes. Mostly in the 1/10 or 2/100 range. Cold war artifact of not wanting to give targeting coordinates for Airports? Well known places don’t change (why fib when the fib would be obvious?) while more obscure ones do. Or maybe it’s just sloppy in the remote places. It was bad enough I decided to accept as equal stations with up to 1/10 degree of LAT/LONG variance as long as the name and number matched. And so it goes. You can learn a lot about politics and character from the inventory file. You can even see wars come and go (France drops out for most of the two world wars). But while it’s great for seeing human induced instrument and political changes, it’s pretty poor for seeing long duration climate and calorimetry processes.
One interesting change from v1 to v2 is that in v1, years with missing data have records with missing data flags for the whole year in the monthly mean temperatures file. For v2 the year is simply (and silently) dropped from the record in the v2.mean file. Personally, I’d rather have the record show the missing data as missing, that way you know it wasn’t accidentally dropped or just ignored.
Just doing this exercise makes it pretty clear to me that what GHCN / GISS are measuring is thermometer change artifacts, not “global warming”.
Also, you can look at the above sample and see that the “meta data” are much different between the two data sets. The v1 version includes a ‘years of coverage’ start and end years along with a ‘percent missing data’ flag. Both very useful things that are now dropped. Clearly the early version cared about coverage over time, while the present one has strongly de-emphasized it.
A random inspection of a few sites has shown some changes to the temperatures with a bit of recent warming. This is only anecdotal at this point. Given the crappy state of the Inventory files, it will take me a while to get a decent Old / New comparison set built and do some decent A/B comparison reports. One thing is certain, though: When Version Three comes out, if the data have changed, I’ll be spotting it and reporting on it. And if the inventory file changes as much as has happened between V1 and V2, I’ll be pointing that out as well.
It’s really pretty simple: Station Change Matters.
Take a look at how one does calorimetry. Go hit up a college chem teacher (they would love the attention of someone actually being interested in how to do calorimetry right ;-) and ask them. You MUST know the mass, specific heat, phase change, heats of fusion and vaporization, etc. Then, and only then, can you make statements about the change of HEAT via using temperature. Then ask them about the impact of changing out the thermometers periodically, moving them around in the experimental apparatus, and changing what technology is used mid-stream. (swapping mercury in glass for electronic, for example). Be prepared for a long lecture… THEN ask what would happen if, over a 100 time period experiment you changed 1/3 of the thermometers every 10 periods.
Then ask if that would be better, or worse, calorimetry technique than was done by Pons and Fleischmann when they thought they had found Cold Fusion… ( IMHO, the cold fusion folks look like stellar work in comparison to what the climate guys are doing.) Frankly, I think that the “Global Warming” folks are going to go down in history as far more bogus than Cold Fusion ever was.
And in the end, I think this explains why looking at single well tended instruments that have had no instrument change shows no ‘climate change’. It is only in the aggregate and with highly questionable ‘homogenization’ and ‘adjustments’ that the false signal of ‘global warming’ can be created. As an artifact of instrument and process change. Exactly those things that are considered horrid technique in the chem lab calorimetry experiments. And for good reason.
I’ve got a few other tools I’m developing to do a forensic comparison of the data sets, but those will need to wait a while to be revealed. Until after version 3 is out and they’ve ‘done the deed’. Mostly those tools will be similar to the tools used to find the ‘fingerprints’ left behind in forensic audits of financial records. There are patterns to natural data that do not show up the same way in ‘adjusted’ data. Minor disturbances in the expected and probable patterns. (Folks like to pick ‘7’ as a ‘random’ number, for example. So if you have lots of ‘7’ and few ‘0’ or ‘9’, there is a clue to look for more fudging.) In the end, any attempt to hide ‘data diddling’ (yes, that is the jargon in the field, believe it or not) usually just creates a different set of clues.
But before I can do that part, I need to get the inventory map worked out, and that, as you can see from the above, is A Piece Of Work… But at least we’re getting some interesting statistics about just how much GHCN is a ‘random box of thermometers’ from decade to decade and just how unstable some places can be. And if the V3 inventory is screwed up by similar changes, you can be assured I’ll be pointing it out.
(NOTE to CANADA: Get your hands off your instruments and stop playing with them. It screws up the readings! 8-)