The Inconsistency Screwage Of GHCN

For the last couple of days I’ve been fussing with the problems caused by the inconsistency of GHCN from version to version. In v3, Russia has 2 country codes. One for the European half, one for the Asia half. In v4 it is only one abbreviation “RS” for all of it.

That showed up in the Russian anomaly comparison graphs of the prior GHCN v3.3 vs v4 Asia set in that I’m trying to compare the two versions of “one country” when it is three country definitions. So, OK, I put a footnote (sort of, really an inline comment) that this was an issue and ignored the inconsistency.

Trying to find a way to “fix that” I thought: “Well, heck, just use WMO number. Each instrument has a unique WMO number. Associate the country with the WMO for each in a distinct table. Make WMO the key field.”

Which resulted in me doing a spot check on the WMO consistency. The first block is v3 inventory file where the first three digits are “country number” then there are 5 digits of WMO# and 3 of flags for instruments near that WMO site or changed instrument at that site. For v4 there are 2 letters of country abbreviation, then three letters of various status information, then the WMO ought to be the next block. So think you can match on those WMO Numbers?:

chiefio@PiM3Devuan2:~/SQL/v3$ grep FRANCIS inventory.in 
10468054000 -21.2200   27.5000 1000.0 FRANCISTOWN                     991S   22FLxxno-9x-9SUCCULENT THORNSA
40778460003  19.3000  -70.3000  110.0 SAN FRANCISCO DE MACORIS D      210U   65HIxxno-9x-9WARM CROPS      B
41476423001  24.4000 -104.3200 1960.0 FRANCISCO I. MADERO, DURANGO   2014R   -9MVDEno-9x-9WARM GRASS/SHRUBC
42500147093  39.7675 -101.8097 1024.7 SAINT FRANCIS                  1030R   -9FLxxno-9x-9COOL GRASS/SHRUBC
42572494000  37.6200 -122.3800    5.0 SAN FRANCISCO                   102U 6253FLxxCO15A 1COASTAL EDGES   C
42574506002  37.7700 -122.4300   22.0 SAN FRANCISCO/MISSION DOLORES    70U 6253HIxxCO 1x-9COASTAL EDGES   C
50998437000  13.3700  122.5200   45.0 SAN FRANCISCO                   125R   -9HIxxCO 1x-9WATER           A

chiefio@PiM3Devuan2:~/SQL/v3$ grep FRANCIS ../v4/inventory.in 
BC008948490 -21.2170   27.5000 1001.0 FRANCISTOWN                   
CA004012720  50.1167 -103.9167  603.0 FRANCIS                       
DR092205945  19.2800  -70.2500  110.0 SAN_FRANCISCO_DE_MACORIS      
MXM00076843  16.7700  -93.3410 1051.9 FRANCISCO_SARABIA             
MXXLT082709  24.9100 -104.4600 1700.0 FRANCISCO_PRIMO_VERD          
RPXLT752551  13.3700  122.5200   45.0 SAN_FRANCISCO                 
SF000175820 -34.2000   24.8330    7.0 CAPE_ST_FRANCIS               
USC00047767  37.7281 -122.5053    2.4 SAN_FRANCISCO_OCEANSD         
USC00147093  39.7675 -101.8067 1024.7 SAINT_FRANCIS                 
USC00168136  30.7775  -91.3769   35.1 ST_FRANCISVILLE               
USC00363018  41.1183  -75.7278  459.9 FRANCIS_E_WALTER_DAM          
USW00023234  37.6197 -122.3647    2.4 SAN_FRANCISCO_INTL_AP         
USW00023272  37.7706 -122.4269   45.7 SAN_FRANCISCO_DWTN            
VEM00080416  10.4850  -66.8440  856.0 GENERALISIMO_FRANCISCO_DE_MIR 
chiefio@PiM3Devuan2:~/SQL/v3$ 

For those who don’t know “San Francisco International Airport” is located in South San Francisco (a different city). I’m pretty sure that the S.O. San Francisco of the first set (72494) is the same as San Francisco INTL AP (023234) of the second one, though perhaps the thermometer moved to a slightly different LAT / LONG at the airport (or they fixed their slight location error). It looks like it has the “A” Airstation flag set for the So. SFO station.

Clearly “Francistown” is the same one. 68054 vs 948490. LAT/LONG match within rounding band.

Then Saint_Francis almost gets a match, but it looks like one of them is offset. “00147” plus the 093 modification flags vs 147093.

But just how on God’s earth to match things up? The names change. The LAT / LONG change small bits. The Country changes (both format and what country places are in as countries come and go). And it looks like the WMO# is a “variable constant” over time.

There are only 2 countries afflicted by this split personality problem, 1/2 Asia 1/2 Europe: Russia & Kazakhstan. I supposed I could just “crowbar” them both into 100% Asia for v3 and ignore the rest of the volatility of definitions. But really, I should not need to do such things.

There’s a great deal of “Dick With Factor” in these datasets at all levels. Even to the assignment of WMO Number to a given station in a given place. It’s almost like they are trying to hide things…

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues and tagged , , . Bookmark the permalink.

3 Responses to The Inconsistency Screwage Of GHCN

  1. gallopingcamel says:

    Over 8 years ago I met Tom Peterson in Asheville to get his take on the sharp reduction in stations listed in the GHCN. At that time all I had was the v2 files but Tom gave me a copy of the v3 files.

    It was my intention to compare v2 with v3 but failed miserably owing to lack of experience manipulating databases. It did not bother me much at the time since Tony Heller soon did it better than I could ever hope to do.

    Now you have v4, it will be interesting to find out what has changed. Back in 2010 I asked Tom Peterson about the “Adjustments” that appeared to cool the past but got nothing out of him that made any sense.

    Dorothy behind the curtain (Part 1)

  2. pouncer says:

    My retail corporation set up seven digit, “meaningful facility code numbers that had similar “dick with” features. It wasn’t malicious but it was extremely careless. After a half century of work around measures we grafted on a whole new set of key facility numbers (of EIGHT digits, so senior management could easily understand why the new numbers were better than the old ones) that were “meaningless”. The meaningful numbers were retained as attributes of the new key codes.

    Except to make legacy programs work the meaningful digits of the seven digit string required pre-processes to extract relevant digits from the string, check against a list of exception tables, then pass stuff back and down to the legacy system, do a chore, and pass the result up and forward to the new SQL based enterprise system …

    *sigh*

  3. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Comments are closed.