A very boring posting, so don’t get your expectations up.
I’m working on a comparison of GHCN v1, v2, and v3 versions of the temperature data. Yes, I know, NCDC has removed the v1 and v2 copies from their web site (leaving one to wonder what they have to hide… or where they’ve hid it). But as I’ve got my own copies saved from “back when” that’s not an impediment at all. ;-) If anyone else needs a copy of the v1 or v2 data, let me know and for a minor donation I’ll make a CD and mail it.
A larger impediment is their tendency to change things for no good reason. (Almost like they are trying to deliberately make it hard for anyone to do comparisons or check up on them… or just sloppy. Pick one.) Things like changing all the country codes from v1 to v2. So I’ve got a fair amount of data matching and translation tables to make. Not decided yet if I’m going to suck it into Open Office (likely) or stick with the FORTRAN where I’ve got a fair number of tools built, but it’s a PITA to do improvisational reporting stuff. I’ll likely try loading a couple of tables into O.O. and see how robust the database features might be, at least for the ‘translation’ to a common comparable data set.
This particular “file” is the “Country Code” file, made from the country code file from each of v1, v2, and v3.
Turns out there were a lot of countries came and went, so figuring out what temperature goes where is a bit complicated. There were also a few “screwing arounds” with what to do about islands and “dependencies”. So we see things like Yugoslavia break up into a bunch of countries, along with the USSR. But we also see a bunch of Islands that did not change status being moved about from “dependencies” to separate country codes. Likely to suit someone’s sense of what is P.C.
My first actual data compare will be to pick some reasonably stable country codes out of this mush and just do a ‘quick compare’ to see what might have changed; or if more comparisons are ‘worth it’. ( I’d started to do that and found the country code / WMO composite key all messed up between v1 and the rest, thus this file).
Why post this here? Well, first off, so y’all can see what I’m doing and not wonder what happened to me. Second, so folks who are interested can see ‘how things changed’. Finally, it gives me a kind of archive if my laptop suddenly got lost.
With that, the file:
The first three fields are the Country Code from v1, v2, and v3 respectively. The description is mostly from v2, but v3 seems to match. When v1 is very different, I’ve prepended the v1 name or a short form of it, then a slash, then the v2 name. In some cases I’ve added a note in parenthesis (such as (split Yugo) for it being split off of Yugoslavia). When you see a number range with a dash, that is from the v1 file and I think it’s the WMO number range in that ‘country’. (Now I need to pick stations from the geographies that changed, then see if the WMO numbers map correctly. So, for example, pick a Hong Kong station and see if that WMO is in country code Hong Kong in v1, and China in v2 and v3.)
Country codes of “000” do not exist in that particular version Country Code file. For those that looked like a split of a prior country, or where it was in a broad “dependent Islands” bucket, I’ve gone ahead and put in the CC that I think belongs. All of those need a cross check too.
And yes, for the terminally curious, I typed the v1 and v3 country codes by hand and did the cut / paste of names by hand. It was quicker than trying to do it automated and come up with a decent QA method. (Might have been faster with Unix / Linux tools, but I’m doing this on the PC laptop for now. The Linux box is shut down at the moment and hard to get set up quickly. Besides, this box has Open Office on it and I’m interested in trying to do a data load to its database system.)
Part of what I find interesting in looking at the file is just how much the country codes gratuitously change from v1 to v2. Another part is just how unstable the notion of “country” is in our modern world.
One curious loose end is Liechtenstein. It is 613 in v1, then simply evaporates. Last time I looked it still existed. Might be interesting to see if they have a very stable thermometer or one with dropping temperatures ;-)
106 101 101 ALGERIA 144 102 102 ANGOLA 138 103 103 BENIN 152 104 104 BOTSWANA 141 105 105 BURKINA FASO 130 106 106 BURUNDI 136 107 107 CAMEROON 101 108 108 CAPE VERDE 133 109 109 CENTRAL AFRICAN REPUBLIC 134 110 110 CHAD 145 111 111 COMOROS 131 112 112 CONGO 142 113 113 IVORY COAST / COTE D'IVOIRE 122 114 114 DJIBOUTI 119 115 115 EGYPT 121 116 116 ERITREA (Ethiopia?) 121 117 117 ETHIOPIA 132 118 118 GABON 140 119 119 GHANA 114 120 120 GUINEA 113 121 121 GUINEA-BISSAU 135 000 000 EQUATORIAL GUINEA 124 122 122 KENYA 143 123 123 LIBERIA 118 124 124 LIBYA 146 125 125 MADAGASCAR 149 126 126 MALAWI 109 127 127 MALI 110 128 128 MAURITANIA 117 129 129 MAURITIUS 103 130 130 MOROCCO 147 131 131 MOZAMBIQUE 151 132 132 NAMIBIA 108 133 133 NIGER 137 134 134 NIGERIA 116 136 136 SAO TOME AND PRINCIPE 111 137 137 SENEGAL 127 138 138 SEYCHELLES 115 139 139 SIERRA LEONE 123 140 140 SOMALIA 153 141 141 SOUTH AFRICA 000 000 143 TERRITORY OF THE FRENCH SOUTHERN AND ANTARCTIC LANDS 000 000 147 BRITISH OVERSEAS TERRITORIES 120 148 148 SUDAN 125 149 149 TANZANIA 112 150 150 THE GAMBIA 139 151 151 TOGO 107 152 152 TUNISIA 126 153 153 UGANDA 128 154 154 ZAIRE 148 155 155 ZAMBIA 150 156 156 ZIMBABWE 199 157 157 AMSTERDAM ISLAND (FRANCE) 199 158 158 ASCENSION ISLAND (U.K.) 199 159 159 CANARY ISLANDS (SPAIN) 104 160 160 CEUTA (SPAIN) 199 161 161 CHAGOS ARCHIPELAGO (U.K.) 155 162 162 LESOTHO 199 163 163 MAYOTTE (FRANCE) 105 164 164 MELILLA (SPAIN) 199 165 165 REUNION ISLAND (FRANCE) 129 166 166 RWANDA 154 167 167 SWAZILAND 199 168 168 TROMELIN ISLAND (FRANCE) 102 169 169 WESTERN SAHARA (MOROCCO) 199 000 000 WMO REGION 1 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 60000-68999 206 201 201 AFGHANISTAN 207 202 202 BAHRAIN 214 203 203 BANGLADESH 229 204 204 KAMPUCHEA / CAMBODIA 230 205 205 CHINA 50000 - 59999 220 205 205 HONG KONG (U.K.) 45000-45010 45030-45040 (China?) 222 206 206 NORTH KOREA / DEMOCRATIC PEOPLE'S REPUBLIC OF KOREA 215 207 207 INDIA 205 208 208 IRAN 204 209 209 IRAQ 224 210 210 JAPAN 201 211 211 USSR / KAZAKHSTAN 203 212 212 KUWAIT 201 213 213 USSR / KYRGYZSTAN 228 214 214 LAOS 218 215 215 MONGOLIA 225 216 216 BURMA / MYANMAR 219 217 217 NEPAL 210 218 218 OMAN 213 219 219 PAKISTAN 208 220 220 QATAR 223 221 221 SOUTH KOREA / REPUBLIC OF KOREA 201 222 222 RUSSIAN FEDERATION (ASIAN SECTOR) 202 223 223 SAUDI ARABIA 216 224 224 SRI LANKA 201 227 227 USSR / TAJIKISTAN 226 228 228 THAILAND 201 229 229 USSR / TURKMENISTAN 209 230 230 UNITED ARAB EMIRATES 201 231 231 USSR / UZBEKISTAN 227 232 232 VIETNAM 211 233 233 YEMEN 212 233 233 PEOPLE'S DEMOCRATIC REPUBLIC OF YEMEN (Merged Yemen?) 221 234 234 MACAU (PORTUGAL) 217 235 235 MALDIVES 230 236 236 TAIWAN ( 230 China ? ) 50000 - 59999 299 000 000 WMO REGION 2 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 20000-59999 313 301 301 ARGENTINA 309 302 302 BOLIVIA 306 303 303 BRAZIL 310 304 304 CHILE 301 305 305 COLOMBIA 307 306 306 ECUADOR 303 307 307 GUYANA 311 308 308 PARAGUAY 308 309 309 PERU 304 312 312 SURINAME 312 313 313 URUGUAY 302 314 314 VENEZUELA 305 315 315 FRENCH GUIANA (FRANCE) 000 316 316 FALKLAND ISLANDS (U.K.) 000 317 317 SOUTH GEORGIA (U.K.) 399 000 000 WMO REGION 3 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 80000-88903 422 401 401 BARBADOS 411 402 402 BELIZE 402 403 403 CANADA 416 405 404 COSTA RICA 407 406 406 CUBA 410 407 407 DOMINICAN REPUBLIC 413 408 408 EL SALVADOR 423 409 409 GRENADA 412 410 410 GUATEMALA 409 411 411 HAITI 414 412 412 HONDURAS 408 413 413 JAMAICA 405 414 414 MEXICO 415 415 415 NICARAGUA 417 416 416 PANAMA 499 417 417 SAINT KITTS AND NEVIS 499? 406 423 423 BAHAMAS / THE BAHAMAS 424 424 424 TRINIDAD AND TOBAGO 404 425 425 UNITED STATES OF AMERICA 401 425 425 ALASKA / USA 418 426 426 ANTIGUA AND BARBUDA 499 427 427 BERMUDA (U.K.) 499? 499 428 428 BRITISH VIRGIN ISLANDS (U.K.) 499? 499 429 429 CAYMAN ISLANDS (U.K.) 499? 419 430 430 DOMINICA 607 431 431 GREENLAND (DENMARK) 499 432 432 GUADELOUPE (FRANCE) 499? 499 433 433 MARTINIQUE (FRANCE) 499? 499 434 434 NETHERLANDS ANTILLES (NETHERLANDS) 499? 499 435 435 PUERTO RICO (U.S.A.) 499? 420 436 436 SAINT LUCIA 421 437 437 SAINT VINCENT AND THE GRENADINES 403 438 438 SAINT PIERRE & MIQUELON ISLAND (FRANCE) 499 439 439 TURKS AND CAICOS ISLANDS 499? 499 440 440 VIRGIN ISLANDS (U.S.A.) 499? 499 000 000 WMO REGION 4 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 70000-79999 513 501 501 AUSTRALIA 508 502 502 FIJI 515 503 503 INDONESIA 506 504 504 KIRIBATI 501 505 505 MALAYSIA 504 506 506 NAURU 511 507 507 NEW ZEALAND 512 508 508 PAPUA NEW GUINEA 517 509 509 PHILIPPINES 502 511 511 SINGAPORE 503 512 512 SOLOMON ISLANDS 510 517 517 TONGA 507 518 518 TUVALU 505 520 520 VANUATU 599 521 521 AMERICAN SAMOA (U.S.A.) 599? 516 522 522 BRUNEI 599 523 523 CHRISTMAS ISLAND (AUSTRALIA) 599? 599 524 524 COCOS ISLANDS (AUSTRALIA) 599 525 525 COOK ISLANDS (NEW ZEALAND) 599 526 526 CORAL SEA ISLANDS (AUSTRALIA) 599 527 527 FEDERATED STATES OF MICRONESIA 599 528 528 FRENCH POLYNESIA (FRANCE) 599 529 529 GUAM (U.S.A.) 599 530 530 JOHNSTON ATOLL (U.S.A.) 599 531 531 MARSHALL ISLANDS 599 532 532 NEW CALEDONIA (FRANCE) 599 533 533 NIUE (NEW ZEALAND) 599 534 534 NORFOLK ISLAND (AUSTRALIA) 599 535 535 NORTHERN MARIANA ISLANDS (U.S.A.) 599 536 536 BELAU 599 537 537 PITCAIRN ISLAND (U.K.) 599 538 538 TOKELAU 599 539 539 WAKE ISLAND (U.S.A.) 599 540 540 WALLIS AND FUTUNA (FRANCE) 509 541 541 WESTERN SAMOA / SAMOA 599 000 000 WMO REGION 5 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 90000-98999 514 000 000 ADELIE LAND 95500-95510 (Antarctica?) 624 601 601 ALBANIA 632 602 602 USSR / ARMENIA 619 603 603 AUSTRIA 632 604 604 USSR / AZERBAIJAN 632 605 605 USSR / BELARUS 610 606 606 BELGIUM 623 607 607 BOSNIA AND HERZEGOVINA (split Yugo) 625 608 608 BULGARIA 623 609 609 CROATIA (split Yugo) 631 610 610 CYPRUS 620 611 611 CZECH REPUBLIC (split 641) 608 612 612 DENMARK 632 613 613 USSR / ESTONIA 603 614 614 FINLAND 614 615 615 FRANCE 632 616 616 USSR / GEORGIA 618 617 617 GERMANY 629 618 618 GREECE 622 619 619 HUNGARY 606 620 620 ICELAND 605 621 621 IRELAND 635 622 622 ISRAEL 627 623 623 ITALY 636 624 624 JORDAN 632 625 625 USSR / KAZAKHSTAN 632 626 626 USSR / LATVIA 634 627 627 LEBANON 632 628 628 USSR / LITHUANIA 611 629 629 LUXEMBOURG 628 630 630 MALTA 632 631 631 USSR / MOLDOVA 623 632 632 MONTENEGRO (split Yugo) 609 633 633 NETHERLANDS 601 634 634 NORWAY 621 635 635 POLAND 617 636 636 PORTUGAL 625 637 637 ROMANIA 632 638 638 USSR / RUSSIAN FEDERATION (EUROPEAN SECTOR) 623 639 639 SERBIA (split Yugo) 620 641 641 SLOVAKIA (split 611 ) 623 000 642 SLOVENIA (split Yugo) 615 643 643 SPAIN 602 645 645 SWEDEN 612 646 646 SWITZERLAND 633 647 647 SYRIA 623 648 648 MACEDONIA (split Yugo) 630 649 649 TURKEY 632 650 650 USSR / UKRAINE 604 651 651 UNITED KINGDOM 699 652 652 FAROE ISLANDS (DENMARK) 699? 616 653 653 GIBRALTAR (U.K.) 699 654 654 MADEIRA ISLANDS (PORTUGAL) 699? 613 000 000 LIECHTENSTEIN 06600-06999 (So where is it now?) 699 000 000 WMO REGION 6 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS 01001-40349 799 700 700 ANTARCTICA 88904-89999 000 000 701 ARGENTINE BASE IN ANTARCTICA 699 800 800 SHIP STATIONS 01001-40349 WMO REGION 6 DEPENDENT ISLANDS / OCEAN VESSEL STATIONS
So there you go. Hopefully it is useful to someone or at least of minor interest.
E.M.,
“If anyone else needs a copy …. I’ll make a CD and mail it.
Even quicker and easier – you could set up a dropbox for free (also back up storage for you). Check email.
623 631 631 MOLDOVA (split Yugo)
It is not split Yugo but split USSR
I doubt that y’ll get much amused regarding the usability of the OO database interface. It is really very far away from MS-access. When I’m doing such an analysis, I create a postgresql database and import the data using psql’s copy command (which is able to read arbitrary CSV formats). Then I use the really exciting R – SQL interface to read the data in and pass them to lattice graphics functions using their formula interface. It is very simple and very fast. Example:
require(RPostgreSQL)
require(lattice)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, user = "smith", password = "deltaT",
dbname = "ghcn", host = "localhost")
df <- dbGetQuery(con, "select station,date, temp, version from temps
where country = 309 and station like '221A%')
xyplot(temp ~ date | station+version,data=df)
@E.M. The real “trick” is to make you think there is such a thing called “temperature”. We should study, instead, whence it comes, how it is produced. For example: If we eat “trash food” we generate more “heat” and originate the UHI effect. This is for sure. Now, as you have pointed out recently, there is a relation between changes in GMF and temperatures.
These are the real issues not those temperature records of dubious origin, which have been adjusted so many times that no one ever will find the original data.
@Verity:
Yeah, but if nobody wants one I’ve done that account set up for nothing… but yes, a good idea. Right now I’ve got a dozen accounts on different things where I’ve forgotten how to log in to them (passwords change in one place, not in a low usage one, after a few cycles the “three tries and you get locked out” bites…) so I’m trying to avoid accumulating even more accounts where they end up “bricked”…
But if I get more than one request, hey, it’s worth it…. ( Or I suppose I could just set it up, load the data, and then forget about it as it won’t change in the future. I’m not NASA or NCDC after all ;-)
@Radovan Korda:
Thanks! All the QA possible helps. (And frankly, the USA education on ‘odd little countries in Europe and Asia’ is about as lame as one can get… I knew them all before the USSR breakup but some of the post-breakup have not been high on my ‘must keep straight’ list. Then you get similar names mixed in. Slovakia Slovenia Monrovia Moravia Moldova. Sigh. At least I had it as a break up of a Slavic Socialist State even if not the right one ;-)
@Hugo M:
Yeah, but it’s the hammer I have right now so the data looks kinda like a nail ;-)
Part of this is just to find out if the OO database is worth using at all for things. It will take me all of about an hour to do a ‘trial fit’. If it can’t handle fixed format data ( i.e. doesn’t have positional field descriptors on input data, only CSV) then I’m likely to run it through FORTRAN to turn fixed format into CSV; as THAT takes me all of about 10 minutes, READ (format) WRITE (format with commas). Once I’m doing that FORTRAN input statement, well, I’ll likely just stay in FORTRAN land for the basic comparisons. Just to get the answer fastest.
I don’t have “MS-access” and I’m not going to buy it. (Tried using it once a few years back. Didn’t like it. It couldn’t do fixed format then either, so unless they added that feature, I’m back in the land of CSV. Don’t know why the last few decades folks have thought fixed format input didn’t matter, but it was trendy for a while…)
Frankly, if OO doesn’t do it, I’m most likely to use one of the SQL types on LINUX ( IFF I can find one of them that can handle fixed format data input… Yeah, I could dump it through the Linux sed or similar Stream EDitors and add commas, but if I’m doing that, I’m back at why not just use the FORTRAN and be done? Other than wanting to play with the database and prepare for some unknown future requests, it’s not needed at that point.
Yeah, the typical “How to approach this problem most efficiently / easily” you get with any data handling problem. Trying to balance the immediate desire for a “any there there?” answer with the longer term “build some infrastructure for an easier round 2, if any”….
@Adolfo:
I’m following both threads.
1) What is the history of data diddling reflected in the data? (How much have things changed from version to version, for example.)
2) Are the temperature data really of any use? Do they connect to anything people do? (Also noting that enthalpy is ignored in temperature data and that alone is a fatal flaw. That almost all the thermometers are at airports that grew over time, went from grass fields to square km of tarmac in the sun; and that alone makes the data useless for climatology.)
And more.
So yes, asking things like “Did the sun do it via the aa modulation and UV variation and does that reflect in the magnetic fields?” is part of what we do here. But it is also important to keep an eye on how the data ‘drift’ over time; as that is direct evidence of data diddling… even if some folks assert it is valid diddling…
Well, using OO was not all that hard, but it has the usual “round about” way that OO tends to have.
I found this page with ‘pointers’:
http://sheepdogguides.com/fdb/fdb1imp1.htm
Which had the helpful bit that you must pass through a spreadsheet on your way to a data load operation (for what I wanted to do), but had the misleading “push” toward CSV as the way and the light.
Turns out that you can similarly load data from a fixed format file and OO gives a nice little window where you make divider lines and slide them to positions to demark fields. Only ‘odd bits’ were needing to make a gratuitous ‘field’ for the spaces between columns (that end up as full columns in the spreadsheet that then you must delete later or OO makes a similar database field…) and being a bit careful about field types. ( It turned the numeric country code into an odd text field if the blank column was left as part of that field).
Then it’s just a matter of ‘select all’ and ‘copy’ then go to the database panel, select ‘table’ and in the table box right click and paste (then fill out a lot of questions about field types, names, etc.)
The bit about dropping records if the text was too long for the field ‘bit me’ so I get to either redo the operation with a longer text field ( I’d guessed 50 char but…) or find the ones that didn’t make it and hand enter. (Ought to be easy with the ID record counter)
At any rate, looks like loading a fixed format data file is not too hard. That just leaves “join” and “report” as odd bits to learn.
Open Office has a very “modal” way of working, with many things hidden on ‘right click’ menus or menus that only show up in certain contexts. (Apple was all about ‘non modal’ user interfaces where things are intuitive and discoverable… Doing the ‘modal’ stuff is non-intuitive and often non-discoverable; so a bit painful unless you can find a written guide sheet somewhere or have learned the manual by heart…) At any rate, knowing what things to click to get the right modes makes it not very hard to load data (now).
It’s also unclear (as of now…) how to make specific fields ‘keys’ or match on them. That will be important for things like taking temperature records with one country code and matching them to other version records with a different code, but we’ll see. Any decent database must have a way to do a ‘match on field’.
For now it’s “So far, so good”.
Well, it will get interesting how said hammer will react when you confront him with 200 metric tonnes of nails…. I’ve just seen that v1, v2 and v3 are online:
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v1
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3
@Hugo M:
I was there a week+ ago.
It’s sneaky. It looks like v1 and v2 are on line, but look inside those directories and it’s only the precipitation and pressure data. The inv (inventory) file and the v1.mean v1.min and v1.max files are all missing. The temperature data sets have been ‘pruned’ out of those online archives. Same thing for v2.mean v2.max v2.min.
I’m sure it’s all innocent… /sarcoff>;
BTW, loading the v1.mean temperature data into Open Office looked like it was going to be very straight forward. Unfortunately, it’s painfully slow and has now failed 3 times with an obscure error message. I think at 308,000 records it’s running into some kind of size limit or data dependent problem.
If I don’t get it working inside an hour, I’m just going to head off to FORTRAN land. It works. It’s not got ‘size’ problem. It is very deterministic. Not ‘trendy’ but effective. (Ought to clear the desk and get the Linux box set up anyway…)
Loading the data into OO spreadsheet had no problems. The “paste” into the database table proceeds normally, but when you try to open the table it says it can’t load the data. Smells like a size limit “wall” being run into, to me.
@E.M: Interesting. The v1 readme says: “This subdirectory contains 19 files”. However, there are only 16 files (actually only 15 when not counting the “README.ghcn~” backup file), with “temp.data” and “precip.data” missing. In v2, only max temperature and precipitation is left, whith minimum temperature, average temperature and pressure data missing. Whereas in v3, temperature min/max/avg data are available, but this time pressure and precipitation data are missing. What could be the possible sense? Plausible deniability?
Regarding OO, while the fixed column width import is very neat feature that I did not know about, I was expecting that it would become increasingly slow and also very difficult to match inconsistently named stations. However, if you can load the data into a spread sheed, you can directly re-export them into a csv format using “save as”, and then import that file into a decent SQL database, having constraints in place that ensure that the deliberate chaos that has done to the naming scheme will be consistently dealt with.
@Hugo M:
It is an “odd thing” that the old data have been evaporated…
I must admit a private conceit here.
Shortly after I came up with the dT/dt method (right hand category list on the side there) I got an inquiry about availability of the source code / method (which I’d already published during the process). I pointed to the older code postings and that was that.
The dT/dt method starts with a “First Differences” approach, but does just a couple of bits differently. For one thing, it is run ‘backwards’. On the presumption that present measurement is done better than in 1800, it starts with the present temperature and calculates anomalies backward from that. Any ‘wacky’ temperature from way back when doesn’t cause all future anomalies to be wacky too… But what is most likely the strongest part of it is that when it comes to missing data, it simply does nothing. No “infilling” or “homogenizing” or whatever. It simply says the truth: “We do not know”. So if a +1.1 anomaly exists and it runs into 2 years of missing data, it holds that +1.1 anomaly for that whole time until the next valid data point. THEN it calculates the new anomaly. If, for example, that new temperature was -1 anomaly, it would find a +0.1 anomaly then existed. So you are never “speculating” in the code, only saying what you know.
Running this code on the old form data showed up a lot of what looked like “splice artifacts” in the way other methods glued together station fragments. So if you had 3 stations contributing to the temperature series for, oh, Marble Bar Australia, you would see the splice artifact nature of the way they were merged in the other code (which merges first, then applies an anomaly calculation). Like I found at Marble Bar:
https://chiefio.wordpress.com/2010/04/03/mysterious-marble-bar/
My method “fixed” that (and tended to highlight it).
The new GHCN (v3) was mysteriously delayed for a while.
When it came out, the “splice” and merge step had been moved “upstream” into the GHCN creation phase. One could no longer demonstrate the ‘splice artifact’ nature of things nor apply the dT/dt method to demonstrate the difference if one did not do a ‘splice first’. The “Spice Was In”…
I like to flatter myself that perhaps it was my little dT/dt method that caused the need to “hide the splice”… It’s not possible to know, really. At least not unless the folks doing it ‘come clean’. But all the timing fits. As does all the effects.
I just figured I’d used the old data to “show up the differences” if ever needed.
Then I saw the old data evaporate with nary a word. No “Last Call”. Just POOF! The magician and cloud of smoke…
Strange that…
However, I’d squirreled away a copy or two ;-)
So at some point I’m going to just take the more recent bits of data spliced onto the end of V2 to make V3; unsplice them and call them a new series, and re-run dT/dt on the “old data” and have a minor discontinuity where V3 starts. Any “jump” at that point will stand out pretty clearly. The splice will be ‘unhidden’.
But it IS a very strange thing that they are so determined to assure that the older versions of the data are hidden. Guess the ‘rewrite the past’ had gotten so extreme that there was nothing left to do but burn the old books…
So that’s my private conceit. That perhaps in some small way my efforts goaded them into this rather public act of desperation. Or maybe they are just incredibly dumb and figured it was just old data nobody needed and dumped it. Malice or stupidity. Hell of a choice…
@E.M.: Never assume your opponents to be stupid… However, what else might be the reason that the NOAA publishes ghcn v3 with three different metadata files for tmin, tmax and tavg? Do they use a different set of station for each of these derived quantities?
$ diff -u ghcnm.tmin.v3.1.0.20120421.qcu.inv.sorted ghcnm.tmax.v3.1.0.20120421.qcu.inv.sorted | grep "^-" | wc
7 61 736
$ diff -u ghcnm.tavg.v3.1.0.20120424.qcu.inv.sorted ghcnm.tmax.v3.1.0.20120421.qcu.inv.sorted | grep "^-" | wc
1617 15864 176226
Not even the meta data produced at the same day are congruent (7 different entries). Comparing metadata for Tmax versus Tavg reveals 1617 different entries, which is more than 10% of the metadata of all stations.
Costa Rica: code 405 (not 404) is used in GHCN v3 as well as in GHCN v2
v3:
40578760000 9.9700 -84.8300 3.0 PUNTARENAS 136S 26FLxxCO 1x-9WATER C
40578762000 10.0000 -84.2200 939.0 JUAN SANTAMAR 1060S 33MVxxno-9A 2TROP. SEASONAL C
40578762001 10.0000 -84.1000 1141.0 SAN JOSE/CENTRAL OFFICE 1327U 391MVxxno-9x-9TROP. SEASONAL C
40578767000 10.0000 -83.0500 3.0 PUERTO LIMON 60S 30FLxxCO 1x-9TROP. SEASONAL C
Slovenia: was used in GHCN v2 with same country code 642 as in GHCN v3.
64214014000 LJUBLJANA/BRN 46.218 14.473 384 427U 169MVxxno-9A15WARM CROPS B
64214015000 LJUBLJANA/BEZ 46.066 14.512 298 319U 169MVxxno-9x-9WARM CROPS C
64214026000 MARIBOR/SLIVN 46.480 15.682 265 293U 95MVxxno-9A 6WARM FOR./FIELD B
64214105001 PORTOROZ 45.52 13.57 95 27R -9HIxxCO 1x-9MED. GRAZING B
Liechtenstein: does not have any station in GHCN v2 or v3. WMO combines Liechtenstein with Switzerland. The single WMO station is Vaduz, 06990, at 47 07 39N, 09 31 03E, which would become 64606990000 if Vaduz were somehow to sneak into GHCN
No my hero, this is not boring at all, very useful!
Do you have a link to GHCN version 1 data?
I hope so …
K.R. Frank
Pingback: GHCN V1 vs V2 (V3 soon) Teaser | Musings from the Chiefio
Pingback: GHCN v3 Portugal and Spain | Musings from the Chiefio