GHCN Version Anomaly Database Documentation

I’ve recently made some changes to the GHCN Analysis database for Version Anomaly comparisons.

This involves some changes to the data formats in some of the database tables, along with some changes in the data load scripts. I have not changed the original FORTRAN program that puts tabs into the data for initial loading. Eventually that Fortran will replace the “\” in some of the data (marker for “source”) with some other character. For now, that change was made via the editor (notes in the prior posting).

So this is just going to be a set of program / schema listings, and a very tiny bit of commentary / documentation.

The Database Creation Steps

Starting from a pristine install of MariaDB (MySQL), with a ‘temps’ database defined (but no tables loaded), it is good practice to do a “drop table” first just to be sure there are no “left overs” from prior runs. This is also done on a “fresh load”. I have a script that does that for me.

These scripts are pulled into MariaDB via the “source” command that says “go read this external file and do what it says”. There is also the ability to store such programs inside the database itself, but I’m not doing that (yet). Why? First off, this works and I didn’t feel it was important enough to spend time “improving” it, so have not bothered to learn how to do “stored programs” inside the database. 2nd, as I’m in “development mode” I may need to just “dump the whole database and start over”. That’s easier if your programs are not stored inside it.

This means I have “hard coded path names” in these scripts. Those would need changing for any other system. As there are only a few scripts, it isn’t that big a workload at present. Eventually I’ll get around to doing stored programs and path names as passed parameters, but that’s usually during a “polish and package” step at the end of “development”.

Drop The Tables

In my home directory is a directory named “SQL” where I have all my database ‘stuff’, divided into table related things (in a sub-directory named ‘tables’) and programs that do something (in a sub-directory named ‘bin’). The Unix / Linux command “cat” is to “concatenate and print” but with just one file name, just prints out the file. So “cat dropall” is printing out what is in the file “dropall” in my SQL/tables directory.

chiefio@PiM3Devuan2:~/SQL/tables$ cat dropall 
DROP TABLE continent;
DROP TABLE country;
DROP TABLE invent3;
DROP TABLE invent4;
DROP TABLE temps3;
DROP TABLE temps4;
DROP TABLE mstats3;
DROP TABLE mstats4;
DROP TABLE anom3;
DROP TABLE anom4;
DROP TABLE yrcastats;

Load The Table Descriptions

Then you get to load all the table schema:

chiefio@PiM3Devuan2:~/SQL/tables$ cat tables 
source /SG500/xfs/chiefio/SQL/tables/continent
source /SG500/xfs/chiefio/SQL/tables/country
source /SG500/xfs/chiefio/SQL/tables/invent3
source /SG500/xfs/chiefio/SQL/tables/invent4
source /SG500/xfs/chiefio/SQL/tables/temps3
source /SG500/xfs/chiefio/SQL/tables/temps4
source /SG500/xfs/chiefio/SQL/tables/mstats3
source /SG500/xfs/chiefio/SQL/tables/mstats4
source /SG500/xfs/chiefio/SQL/tables/anom3
source /SG500/xfs/chiefio/SQL/tables/anom4
source /SG500/xfs/chiefio/SQL/tables/yrcastats

Of course, in any version on your machine you would change “/SG500/xfs/chiefio/SQL/tables” to whatever path name you have used to store the schema files. You could also just type all these commands by hand into MariaDB, but I’m lazy about typing so do things to make that as little as possible and automate it ;-)

This pulls in a “schema” that describes each table, and builds the empty table. (We load it further down). This just defines what stuff will be stored in the table, in what format, and is it indexed or not.

A couple of words about each table:

This is in no way optimized nor normalized. It is a “just growed” system where each design choice was made on top of the prior choices, best or not. This is normal. Once a system works and does what you desire, IF it is worth it, you go back and re-design it with all your lesson learned included from the start.

I’m NOT normalizing the tables as I have specific goals I’m trying to achieve. One of them is easy reporting Python programs with embedded SQL (so long complicated SQL doing lots of joins is an issue). I also fully expect to just load the data and ‘flow’ it all the way through. NOT doing a lot of “updates” to any given step. Normalized data is best when the average use isn’t known (so you can’t optimize for it) and when there will be updates done (so redundant data means multiple update points and potential errors / discrepancies). As neither of those apply, I’m doing more “for this purpose” design and “reload from scratch” if it ever changes.

Loading The Tables

I have a script that does the load and making the indexes for all these tables in one command. That’s useful on major changes, or when moving to a new platform. Here it is:

chiefio@PiM3Devuan2:~/SQL/bin$ cat LoadTables
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lcontinent
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lcountry
source /SG500/xfs/chiefio/SQL/tables/mkindexcn
source /SG500/xfs/chiefio/SQL/bin/LOAD/Linv3
source /SG500/xfs/chiefio/SQL/bin/LOAD/Linv4
source /SG500/xfs/chiefio/SQL/bin/LOAD/Ltemps3
source /SG500/xfs/chiefio/SQL/tables/mkindex3
source /SG500/xfs/chiefio/SQL/bin/LOAD/Ltemps4
source /SG500/xfs/chiefio/SQL/tables/mkindex4
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lmstats3
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lmstats4
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lanom3
source /SG500/xfs/chiefio/SQL/tables/mkanomindex3
source /SG500/xfs/chiefio/SQL/bin/LOAD/Lanom4
source /SG500/xfs/chiefio/SQL/tables/mkanomindex4

Note it does not load the yrcastats table as that takes 4 hours on the Pi and I want to know all this worked right before I launch it. Also note that the making of indexes is via tings in files that start with ‘mk’ while the actual loading of data has names that start with ‘L’. I treat the making of an index as a table issue, so the mk bits are stored with the table schema in SQL/tables.

Continent

This is just the list of what “region” number has what continent name.

chiefio@PiM3Devuan2:~/SQL/tables$ cat continent
CREATE TABLE continent (
    version CHAR(5) NOT NULL,
    ascension CHAR (10),
    region CHAR(1) NOT NULL,
    region_name CHAR(25)
    ) ;

Version and ascension are so that I can document if it changes from version to version. Likely not needed for this table, but we’ll see it in many others too. (I might change this in the future if it doesn’t work as well in practice as desired; I’ve already got some tables by version instead of just using the version field).

Version is v3.3 or v4 or eventually v2 and v1 of GHCN.

Ascension is just what date is this particular copy, for the eventual comparison of different copies of, say v2 saved at different times.

I’m not yet doing much with either of those two fields and they are likely irrelevant to this table.

Region is a single number digit. That’s what GHCN prior to v4 uses. I associate it with a name (the next field).

Region_name is just the name of the continent or region. Mostly this is continents, but GHCN makes a tepid attempt to have some ocean data in it, so “Ships” is one of the “regions”. As the vast bulk of all the data are from continents, I use the two words interchangeably. But do note that “Ocean Ships” and “All those Pacific Islands” are not exactly continents and a purist ought to only say “regions”.

Here’s the load script. Note it is just a brute force “stuff in a value”. All my “load tables” scripts / programs are in a directory named “LOAD” inside the ‘bin’ directory.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lcontinent 
INSERT INTO continent VALUES ('Gv3','7Sept2015','1','Africa');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','2','Asia');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','3','South America');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','4','North America');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','5','Australia Pacific Islands');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','6','Europe');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','7','Antarctica');
show warnings;
INSERT INTO continent VALUES ('Gv3','7Sept2015','8','Ship Stations Ocean');
show warnings;

My original data load was from a v3 version so I manually coded the version and date of that copy. As these have not changed over the years, it is likely irrelevant and ought to be removed. The only important bit is the association of the region number (1-8) with the continent name or “ships”.

OTOH, polishing this particular stone doesn’t buy much, so it is as it is.

Countries

In order to match the country number from v3.3 to the country abbreviation used in v4, I had to make a tab separated file of country numbers, abbreviations, and names. Yes, abut 240 lines of it. This table holds that data in the database loaded from that file:

chiefio@PiM3Devuan2:~/SQL/tables$ cat country
CREATE TABLE country (
    cnum CHAR(3) NOT NULL,
    abrev   CHAR(2),
    region CHAR(1) NOT NULL,
    cname  CHAR(60),
    version CHAR(5),
    ascension CHAR(10),
    PRIMARY KEY(cnum)
    ) ;

And yes, *Nix folks make endless small jokes about things like “this is cat country” or naming files like “kibble’ so you can have “cat kibble”. Things bored programmers do while writing accounting software ;-) Remember to be wary of files with names like “balls” and “brains” as not thinking and saying to your assistant “I think brain is too small” or “We need to remove brains” can result in your stepping in the joke goo… Watch too for “poo” as “cat poo” is, um, an “issue”.

Note at the bottom of the list of fields is “PRIMARY KEY(cnum)”. That says that the unique field in this table is a 3 digit value (cnum CHAR(3) up the list). That number is used by v3.3 and earlier as part of the composite station identity number. Version 4 uses a 2 letter abbreviation. You can key off either of them, but cnum will be static now that it is historical. It could also be an integer instead of character as it is a 3 digit int but since it isn’t used in any math, I’m just making it characters.

Region (or continent) is a single digit (stored as a character). It is also identical to the first digit of cnum so is redundant, but useful for doing a continent lookup to get the name.

Then cname is just the name of the country. You might think 60 characters is way too many for a country name, but not to GHCN…

Again we have the version and ascension fields. Countries do change over time, and this allows for it, but I’ve really not implemented that function yet; and my never. In fact, I had to resort to a hand edit of the v4 country file to get this to work at all, so it may make sense to just dump version and ascension some day.

Here we load the country data. Notice it is ONLY loading the ‘v4/country-tabs’ file. I have a v3.3/country-tabs file and had intended to “do the matching’ in the database, but that “had issues”, so just plugged the v3 (and prior) cnums into the v4 file and did the matching by hand, then loaded it into this table.

Yes, yet another bit of historical stone that ought to be polished more…

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lcountry
LOAD DATA LOCAL INFILE '/SG500/xfs/chiefio/SQL/v4/country-tabs' 
INTO TABLE country (region,cnum,abrev,cname)
SET version="Gv4",ascension="17MAR2019";

show warnings;

You can see that I just hard code the version and ascension values to the v4 ones. Like I said, that likely all needs to just “go away”.

I add an index field (in addition to the Primary Key that is an index by default) so that various searches and table JOIN operations will go much faster:

chiefio@PiM3Devuan2:~/SQL/tables$ cat mkcnindex 
CREATE INDEX abrevx    ON country (abrev);

Just in case anyone wants to run a copy of this themselves, here’s the 240 lines of the country-tabs file. Others can just scroll over it for the rest of the tables and load programs. OTOH, it’s a interesting list of all the countries…

chiefio@PiM3Devuan2:~/SQL/v4$ cat country-tabs 
4	426	AC	Antigua and Barbuda 
2	230	AE	United Arab Emirates 
2	201	AF	Afghanistan
1	101	AG	Algeria 
6	604	AJ	Azerbaijan 
6	601	AL	Albania
6	602	AM	Armenia 
1	102	AO	Angola 
5	521	AQ	American Samoa [United States] 
3	301	AR	Argentina 
5	501	AS	Australia 
6	603	AU	Austria 
7	700	AY	Antarctica 
2	202	BA	Bahrain 
4	401	BB	Barbados 
1	104	BC	Botswana 
4	427	BD	Bermuda [United Kingdom] 
6	606	BE	Belgium 
4	423	BF	Bahamas, The 
2	203	BG	Bangladesh
4	402	BH	Belize
6	607	BK	Bosnia and Herzegovina 
3	302	BL	Bolivia 
2	216	BM	Burma
1	103	BN	Benin 
6	605	BO	Belarus 
5	512	BP	Solomon Islands 
3	303	BR	Brazil
6	608	BU	Bulgaria
5	522	BX	Brunei
1	106	BY	Burundi
4	403	CA	Canada 
2	204	CB	Cambodia
1	110	CD	Chad 
2	224	CE	Sri Lanka 
1	112	CF	Congo (Brazzaville) 
1	154	CG	Congo (Kinshasa)
2	205	CH	China  
3	304	CI	Chile 
4	429	CJ	Cayman Islands [United Kingdom] 
5	524	CK	Cocos (Keeling) Islands [Australia] 
1	107	CM	Cameroon 
1	111	CN	Comoros
3	305	CO	Colombia 
5	535	CQ	Northern Mariana Islands [United States] 
4	405	CS	Costa Rica 
1	109	CT	Central African Republic 
3	406	CU	Cuba 
1	108	CV	Cape Verde
5	525	CW	Cook Islands [New Zealand]
6	610	CY	Cyprus 
6	612	DA	Denmark 
1	114	DJ	Dijibouti
4	430	DO	Dominica
4	407	DR	Dominican Republic 
3	306	EC	Ecuador 
1	115	EG	Egypt 
6	621	EI	Ireland 
1	199	EK	Equatorial Guinea
6	613	EN	Estonia 
1	116	ER	Eritrea 
4	408	ES	El Salvador 
1	117	ET	Ethiopia 
1	198	EU	Europa Island [France]
6	611	EZ	Czech Republic 
3	315	FG	French Guiana [France] 
6	614	FI	Finland 
5	502	FJ	Fiji
3	316	FK	Falkland Islands (Islas Malvinas) [United Kingdom] 
5	527	FM	Federated States of Micronesia 
5	528	FP	French Polynesia 
6	615	FR	France 
1	143	FS	French Southern and Antarctic Lands [France]
1	150	GA	Gambia, The 
1	118	GB	Gabon 
6	616	GG	Georgia
1	119	GH	Ghana
6	653	GI	Gibraltar [United Kingdom]
4	409	GJ	Grenada
6	699	GK	Guernsey
4	431	GL	Greenland [Denmark] 
6	617	GM	Germany 
4	432	GP	Guadeloupe [France] 
5	529	GQ	Guam [United States] 
6	618	GR	Greece 
4	410	GT	Guatemala 
1	120	GV	Guinea 
3	307	GY	Guyana 
4	411	HA	Haiti
2	299	HK	Hong Kong
4	412	HO	Honduras 
6	609	HR	Croatia
6	619	HU	Hungary 
6	620	IC	Iceland 
5	503	ID	Indonesia 
6	698	IM	Isle of Man
2	207	IN	India 
1	161	IO	British Indian Ocean Territory [United Kingdom] 
2	208	IR	Iran 
6	622	IS	Israel 
6	623	IT	Italy 
1	113	IV	Cote D'Ivoire 
2	209	IZ	Iraq 
2	210	JA	Japan 
6	693	JE	Jersey
4	413	JM	Jamaica 
6	697	JN	Jan Mayen [Norway] 
6	624	JO	Jordan
5	530	JQ	Johnston Atoll [United States] 
1	197	JU	Juan De Nova Island [France]
1	122	KE	Kenya 
2	213	KG	Kyrgyzstan 
2	206	KN	Korea, North 
5	504	KR	Kiribati 
2	221	KS	Korea, South 
5	523	KT	Christmas Island [Australia] 
2	212	KU	Kuwait 
6	625	K!	Kazakhstan
2	211	KZ	Kazakhstan 
2	214	LA	Laos 
6	627	LE	Lebanon
6	626	LG	Latvia 
6	628	LH	Lithuania
1	123	LI	Liberia 
6	641	LO	Slovakia 
5	599	LQ	Palmyra Atoll [United States] 
6	696	LS	Liechtenstein
1	162	LT	Lesotho 
6	629	LU	Luxembourg 
1	124	LY	Libya 
1	125	MA	Madagascar 
4	433	MB	Martinique [France]
2	234	MC	Macau S.A.R
6	631	MD	Moldova 
1	163	MF	Mayotte [France]
2	215	MG	Mongolia 
6	695	MH	Montserrat
1	126	MI	Malawi
6	632	MJ	Montenegro 
6	648	MK	Macedonia 
1	127	ML	Mali 
1	130	MO	Morocco 
1	129	MP	Mauritius 
5	598	MQ	Midway Islands [United States} 
1	128	MR	Mauritania 
6	630	MT	Malta 
2	218	MU	Oman 
2	235	MV	Maldives 
4	414	MX	Mexico 
5	505	MY	Malaysia 
1	131	MZ	Mozambique 
5	532	NC	New Caledonia [France] 
5	533	NE	Niue [New Zealand]
5	534	NF	Norfolk Island [Australia]
1	133	NG	Niger 
5	520	NH	Vanuatu 
1	134	NI	Nigeria
6	633	NL	Netherlands 
6	634	NO	Norway 
2	217	NP	Nepal 
5	506	NR	Nauru
3	312	NS	Suriname
4	434	NT	Netherlands Antilles [Netherlands]
4	415	NU	Nicaragua 
5	507	NZ	New Zealand 
3	308	PA	Paraguay 
5	537	PC	Pitcairn Islands [United Kingdom] 
3	309	PE	Peru 
2	219	PK	Pakistan 
6	635	PL	Poland 
4	416	PM	Panama 
6	636	PO	Portugal 
5	508	PP	Papua New Guinea 
5	536	PS	Palau
1	121	PU	Guinea-Bissau
2	220	QA	Qatar
1	165	RE	Reunion [France]
6	639	RI	Serbia
5	531	RM	Marshall Islands 
6	637	RO	Romania 
5	509	RP	Philippines 
4	435	RQ	Puerto Rico [United States] 
2	222	RS	Russia 
6	638	R!	Russia
1	166	RW	Rwanda
2	223	SA	Saudi Arabia 
4	438	SB	Saint Pierre and Miquelon [France]
4	417	SC	Saint Kitts and Nevis
1	138	SE	Seychelles 
1	141	SF	South Africa 
1	137	SG	Senegal 
1	196	SH	Saint Helena [United Kingdom] 
6	642	SI	Slovenia 
1	139	SL	Sierra Leone 
5	511	SN	Singapore
1	140	SO	Somalia
6	643	SP	Spain 
4	436	ST	Saint Lucia 
1	148	SU	Sudan 
6	694	SV	Svalbard [Norway] 
6	645	SW	Sweden 
3	317	SX	South Georgia & the South Sandwich Islands [United Kingdom]
6	647	SY	Syria 
6	646	SZ	Switzerland 
4	424	TD	Trinidad and Tobago 
1	168	TE	Tromelin Island [France]
2	228	TH	Thailand 
2	227	TI	Tajikistan 
5	538	TL	Tokelau [New Zealand] 
5	517	TN	Tonga 
1	151	TO	Togo 
1	136	TP	Sao Tome and Principe
1	152	TS	Tunisia 
5	597	TT	Timor-Leste
6	649	TU	Turkey 
5	518	TV	Tuvalu 
2	236	TW	Taiwan
2	229	TX	Turkmenistan 
1	149	TZ	Tanzania 
1	153	UG	Uganda
6	651	UK	United Kingdom 
6	650	UP	Ukraine 
4	425	US	United States 
1	105	UV	Burkina Faso 
3	313	UY	Uruguay 
2	231	UZ	Uzbekistan 
4	437	VC	Saint Vincent and the Grenadines
3	314	VE	Venezuela 
2	232	VM	Vietnam 
4	440	VQ	Virgin Islands [United States] 
1	132	WA	Namibia 
5	540	WF	Wallis and Futuna [France]
1	169	WI	Western Sahara 
5	539	WQ	Wake Island [United States]
5	541	WS	Samoa
1	167	WZ	Swaziland 
9	900	XX	Unknown
2	233	YM	Yemen
1	155	ZA	Zambia 
1	156	ZI	Zimbabwe 
chiefio@PiM3Devuan2:~/SQL/v4$ 

The Instrument Inventory

Each version has a different set of instruments. How on God’s Earth they think they can do global Calorimetry with a constantly shifting foundation of instruments is beyond me. One of the things MOST drummed into students in my Chemistry classes was that IF you fool around with the thermometers your calorimmetery WILL BE WRONG PERIOD. “Don’t even touch the thermometers!”. Yet the Climate Folks are constantly touching, moving, replacing, and molesting their thermometers. Maybe they never did calorimetry in chem class. Maybe they never did chemistry?

So these tables track what instruments are in which version. Notice that at this point I’ve divided the versions into two tables, giving up on the idea of one table and using a version field in processing them. “Version” is now just a documentation field… Partly (mostly) this is due to the actual data offered in each version about the instruments being so entirely different.

chiefio@PiM3Devuan2:~/SQL/tables$ cat invent3
CREATE TABLE invent3 (
    stnID      CHAR(11) NOT NULL,
    version    CHAR(5) NOT NULL,
    ascension  CHAR(10),
    type       CHAR(4),
    region     CHAR(1) NOT NULL,
    cnum       CHAR(3) NOT NULL,
    wmo        CHAR(5) NOT NULL,
    wno_near   CHAR(3) NOT NULL,
    latitude   CHAR(8),
    longitude  CHAR(9),
    stn_elev   CHAR(6),
    name       CHAR(30) NOT NULL,
    grid_elev  CHAR(4),
    pop_class  CHAR(1),
    pop_size   CHAR(5),
    topo_type  CHAR(2),
    veg        CHAR(2),
    prox_water CHAR(2),
    dist_water CHAR(2),
    airport    CHAR(1),
    dist_A_to_U CHAR(2),
    veg_grid    CHAR(16),
    pop_class_nitelites CHAR(1),
    coslong    CHAR(9),
    PRIMARY KEY (stnID)
    ) ;

Loading it is nearly trivial. Since I laid out the table fields in the same order as the file, you just say load this into that.:

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Linv3
LOAD DATA LOCAL INFILE '/SG500/xfs/chiefio/SQL/v3/3invent.out' INTO TABLE invent3;

show warnings;

The “magic sauce’ here is in the creation of the inventory file to be loaded. That FORTRAN tab inserting program has not changed from the first posting.

Since those fields are documented in the GHCN description of the inventory, I’m not going to repeat it here. Just notice how different is the data available in v4, how reduced:

chiefio@PiM3Devuan2:~/SQL/tables$ cat invent4
CREATE TABLE invent4 (
    stnID      CHAR(11) NOT NULL,
    abrev      CHAR(2)  NOT NULL,
    f1         CHAR(1),
    f2         CHAR(1),
    f3         CHAR(1),
    f4         CHAR(1),
    wmo        CHAR(5) NOT NULL,
    wnl        CHAR(6) NOT NULL,
    latitude   CHAR(8),
    longitude  CHAR(9),
    stn_elev   CHAR(6),
    name       CHAR(30) NOT NULL,
    version    CHAR(5) NOT NULL,
    ascension  CHAR(10),
    type       CHAR(4),
    coslong    CHAR(9),
    PRIMARY KEY (stnID)
    ) ;

I’ve added the computed field “coslong” to enable easy plotting of ‘area correct’ global maps of temperatures and anomalies. It is done in a FORTRAN program documented elsewhere.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Linv4
LOAD DATA LOCAL INFILE '/SG500/xfs/chiefio/SQL/v4/invent_tabs' INTO TABLE invent4;

show warnings;

Again, loading it fairly trivial and the FORTRAN program to make the tab separated file has not changed.

Temperature Data

This is the actual temperature data from the GHCN files. Note that from here on forward each version is getting it’s own tables. Pretty much given up on the idea of a version field as a key in one table.

Notice I’ve changed the temperature fields from this point forward to DECIMAL(6,2) from 5,2. Just a bit of paranoia after running into those “crazy high” 100+ C temperatures. Adding some head room so crazy temps can load and not be truncated into something that passes for more normal, yet wrong; nor give errors on loading. This means the field can hold a value like 1234.56 which is clearly overkill. I’m just not sure if MySQL / MariaDB counts the negative sign as a width value, so if -123.45 would need 6 wide, or not, and it is easier to burn some extra disk space than to look it up or test it ;-) So two bits of data size paranoia. A TB of disk running about $25, I don’t care…

chiefio@PiM3Devuan2:~/SQL/tables$ cat temps3
CREATE TABLE temps3 (
    stnID     CHAR(11) NOT NULL,
    version   CHAR(5) NOT NULL,
    ascension CHAR(10),
    region    CHAR(1) NOT NULL,
    cnum      CHAR(3) NOT NULL,
    wmo       CHAR(5) NOT NULL,
    near_wmo  CHAR(3),
    year      CHAR(4),
    month     CHAR(4),
    type      CHAR(4) NOT NULL,
    deg_c     DECIMAL(6,2),
    missing   CHAR(1),
    qc        CHAR(1),
    source    CHAR(1),
    PRIMARY KEY (stnID,year,month,type)
    ) ;

The Station ID is different between the v3 and prior versions and v4, so you can’t match on it. Thus might as well keep them separate. I store each temperature value in a unique record by Station, Month, Year, and type. The idea being to have MIN MAX and AVG in the same table. Not got there yet. Just AVG so far.

As v3 and prior have a cnum and not an abbreviation, that’s a difference too. I’d like to have a distinct table for mapping a stnID to a cnum, region, etc. but this was expedient. Again note that “region’ and ‘wmo’, ‘near wmo’ and ‘cnum’ are redundant parts of this Station ID but not the v4 Station ID.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Ltemps3
LOAD DATA LOCAL INFILE '/SG500/xfs/chiefio/SQL/v3/3temps.out' INTO TABLE temps3;

show warnings;

The load script once again just sucking in a pre-formatted file.

I make a couple of indexes on temps3:

chiefio@PiM3Devuan2:~/SQL/tables$ cat mkindex3
CREATE INDEX regionx  ON temps3 (region);
CREATE INDEX regmonx  ON temps3 (region, month);

Moving on to temps4, we once again see a load of “gratuitous change’ and incompatible / non-comparable fields. It’s almost like they are trying to make comparisons between versions impossible. Not to worry, it’s only impossible for some people ;-)

Again I made the deg_C field one digit longer.

CREATE TABLE temps4 (
    stnID     CHAR(11) NOT NULL,
    abrev     CHAR(2) NOT NULL,
    F1        CHAR(1),
    F2        CHAR(1),
    F3        CHAR(1),
    F4        CHAR(1),
    wmo       CHAR(5),
    year      CHAR(4) NOT NULL,
    type      CHAR(4) NOT NULL,
    version   CHAR(5) NOT NULL,
    ascension CHAR(10),
    month     CHAR(4),
    deg_c     DECIMAL(6,2),
    missing   CHAR(1),
    qc        CHAR(1),
    source    CHAR(1),
    PRIMARY KEY (stnID,year,month,type)
    ) ;

Basic idea stays the same. Temperature stored by Station, Year, Month and type. Station ID is an incompatible format with the prior versions, the abbreviation for country means I needed to make that countries map table. Then some various flags (see the GHCN docs) and my best guess as to what is likely the WMO number inside that Station ID. At this point, version and ascension are basically vestigial documentation fields.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Ltemps4
LOAD DATA LOCAL INFILE '/SG500/xfs/chiefio/SQL/v4/temps.out' INTO TABLE temps4;

show warnings;

Then the v4 table indexes:

chiefio@PiM3Devuan2:~/SQL/tables$ cat mkindex4
CREATE INDEX abrevx ON temps4 (abrev);

There isn’t a lot of planning in the indexes. IF some report or graph or load / JOIN takes too long, I try tossing an index at it. If that works, the index stays in the scripting…

Computed Statistics By Month

Here the basic processing starts to move from FORTRAN file formatting / pre-processing into inside the database system computing. As I do anomalies ONLY by comparing a single instrument to itself, inside a single month, across the years, I need to compute some statistics to make that happen. Like “What is the average January temperature for this instrument?”. Then you can subtract that from any given January for that instrument to get the “anomaly’ for that instrument in that year in that month. No “baseline” needed. No homogenizing needed. No infilling needed. No hypothetical grid box needed. Just real data and statistics about it. We make those statistics here.

chiefio@PiM3Devuan2:~/SQL/tables$ cat mstats3
CREATE TABLE mstats3 (
    stnID CHAR(11)     NOT NULL,
    month CHAR(4)      NOT NULL,
    mean  DECIMAL(6,2) NOT NULL,
    big   DECIMAL(6,2) NOT NULL,
    small DECIMAL(6,2) NOT NULL,
    num   INTEGER      NOT NULL,
    trang DECIMAL(6,2) NOT NULL,
    stdev DECIMAL(6,2) NOT NULL,
    pctd  FLOAT,
    pctm  FLOAT,
    PRIMARY KEY (stnID,month)
    ) ;

Note the key is Station ID and Month. For a given station, in a given month, what are the average (mean), the maximum (big) the minimum (small), how many values are there (num), what is the range from maximum to minimum (trange) and what is the standard deviation in them (stdev).

Some of these were float fields before. I’ve changed them to all be DECIMAL(6.2). This means I needed to add a “ROUND” function to some other steps, and for some fields (like Standard Deviation) it is clearly way too long (nobody gets out to 9000.00 standard deviations do they?…) but I wanted to stamp out all the warnings AND not have any risk of data truncation, so went for over the top. These could easily be tuned down / optimized later.

I’ve left percent missing (pctm) and the other percent valid data (pctd) as floats since I’m not using them. They are for some future idea and could just be left out. Notice that the loading program doesn’t even use them. They are presently NULLs in the database.

This data loading program is more interesting. First off, it is “table to table’. We take data out of the temps3 temperature table, compute some statistics, and stuff it into the monthly statistics table mstats3. Very similar processing is done on the version 4 data.

So why use field names like big, small, trang? Well, min, max, range, and stddev are key words to SQL, so the field name must be something else to avoid confusing the database SQL engine. You will see those functions being used in the load program below to find the AVG average, or mean, the MAX/MIN sizes, to COUNT up how many, then to take the difference between the MAX aand MIN values to get the total range, and then find the standard deviation of the temperatures. Notice I’ve put a rounding function ROUND on the computation of the mean and standard deviation values so as to avoid truncation notices. This trims them to 2 decimal places matching the new field size.

There are opportunities to tune this more on field sizes and widths.

Note that I’ve made the “exclude -99.9 missing data flag” screen into a more general “toss anything under -90 C or over 60 C in computing statistics as they are bogus values”. Something similar is also added to the computing of anomilies further down.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lmstats3
INSERT INTO  mstats3 (stnID,month,mean,big,small,num,trang,stdev)
SELECT stnID,month,
ROUND(AVG(deg_C),2),MAX(deg_C),MIN(deg_C),COUNT(deg_C), MAX(deg_C)-MIN(deg_C), ROUND(STDDEV(deg_C),2)
FROM temps3 
WHERE deg_C>-90.0  AND deg_C -90.0 AND deg_C< 60.0
GROUP BY stnID,month;

show warnings;

The Version 4 monthly statistics table is almost identical to the v3.3 one, but by now I’d been burned enough with trying to fit them both into one table that I’ve given up ;-)

Besides, this way, when comparing one to the other, I don’t have to traverse any v3 data computing v4 graphs, nor traverse any v4 data when making v3. I can always “join” them if I need it mushed together.

chiefio@PiM3Devuan2:~/SQL/tables$ cat mstats4
CREATE TABLE mstats4 (
    stnID CHAR(11)     NOT NULL,
    month CHAR(4)      NOT NULL,
    mean  DECIMAL(6,2) NOT NULL,
    big   DECIMAL(6,2) NOT NULL,
    small DECIMAL(6,2) NOT NULL,
    num   INTEGER      NOT NULL,
    trang DECIMAL(6,2) NOT NULL,
    stdev DECIMAL(6,2) NOT NULL,
    pctd  FLOAT,
    pctm  FLOAT,
    PRIMARY KEY (stnID,month)
    ) ;

Note the same changes of fields to DECIMAL(6.2) and the same addition of ROUND to the loading program. Note also the same “WHERE deg_C>-90.0 AND deg_C< 60.0” quality and missing data screen.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lmstats4
INSERT INTO  mstats4 (stnID,month,mean,big,small,num,trang,stdev)
SELECT stnID,month,
ROUND(AVG(deg_C),2),MAX(deg_C),MIN(deg_C),COUNT(deg_C), MAX(deg_C)-MIN(deg_C), ROUND(STDDEV(deg_C),2)
FROM temps4 
WHERE deg_C>-90.0 AND deg_C< 60.0
GROUP BY stnID,month;

show warnings;

There are a LOT of opportunities to explore in these statistics. So far I’m only using them to compute the anomaly tables below.

Computed Anomalies Of Temperatures

Here we finally come to the meat of it. Now that we have the temperatures and the statistics, we can easily compute the anomalies of temperatures for each instrument for each monthly temperature value.

This is very similar in structure to the temperature table, and it might be a big space saver to just store the anomaly value in the temperature table itself. OTOH, doing UPDATE is horridly slow in MySQL/MariaDB, and I really don’t need to be dredging through the whole block of all of it when I’m just processing the anomaly part for graphs and such. Then there is that newly discovered by me (known to others forever ;-) limit that you can’t update a field in a table you are using in a “WHERE” screen, so that might be a problem too. Easy to avoid it all with some small bit of disk space.

chiefio@PiM3Devuan2:~/SQL/tables$ cat anom3
CREATE TABLE anom3 (
    stnID    CHAR(11)     NOT NULL,
    region   CHAR(1)      NOT NULL,
    abrev    CHAR(2),
    cnum     CHAR(3)      NOT NULL,
    wmo      CHAR(5)      NOT NULL,
    near_wmo CHAR(3),
    year     CHAR(4)      NOT NULL,
    month    CHAR(4)      NOT NULL,
    deg_C    DECIMAL(6,2) NOT NULL,
    PRIMARY KEY (stnID,year,month)
    ) ;

Again the big change being a small one of DECIMAL(6.2) instead of 5.2 size.

Also note that the Station ID is a broken down into region, cnum, wmo, near_wmo with redundancy. This is for easier processing later, but is redundant data.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lanom3
INSERT INTO  anom3 (stnID,region,abrev,cnum,wmo,near_wmo,year,month,deg_C)
SELECT T.stnID,T.region,C.abrev,T.cnum,T.wmo,T.near_wmo,T.year,T.month,ROUND(T.deg_C-ST.mean,2)
FROM temps3 AS T
INNER JOIN country as C ON T.cnum=C.cnum
INNER JOIN mstats3 AS ST 
ON
        ST.stnID=T.stnID
	AND ST.month=T.month 
WHERE T.deg_C > -90.0 AND T.deg_C  -90.0 AND T.deg_C < 60.0 
;
show warnings;

Changes in this loading program are just the addition of ROUND functions and the addition of the quality range screening in the WHERE statement.

Note that I do a JOIN to country to get the abbreviation used in v4 matched to this v3 data country number.

Next the anomaly table for v4, anom4, is almost identical. Just missing near_wmo, wmo and with a bit of other order of things. I likely ought to make the order of the fields match better (region, abrev, cnum in different orders) but it doesn’t really matter. The v4 Station ID on inspection has a very confused wmo number field compared to v3, so I’m not sure what’s going on there. Have they just dropped the WMO meaning, or mixing other stuff in, or just renumbered everything? Whatever. Station ID entire is the key here anyway.

chiefio@PiM3Devuan2:~/SQL/tables$ cat anom4
CREATE TABLE anom4 (
    stnID  CHAR(11)     NOT NULL,
    abrev  CHAR(2)      NOT NULL,
    region CHAR(1)      NOT NULL,
    cnum   CHAR(3)      NOT NULL,
    year   CHAR(4)      NOT NULL,
    month  CHAR(4)      NOT NULL,
    deg_C  DECIMAL(6,2) NOT NULL,
    PRIMARY KEY (stnID,year,month)
    ) ;

Again the change to 6.2 and in the loading program adding the WHERE screen on insane values along with ROUND on the anomaly computation.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat Lanom4
INSERT INTO  anom4 (stnID,abrev,region,cnum,year,month,deg_C)
SELECT T.stnID,T.abrev,C.region,C.cnum,T.year,T.month,ROUND(T.deg_C-ST.mean,2)
FROM temps4 AS T
INNER JOIN country AS C
ON T.abrev=C.abrev
INNER JOIN mstats4 AS ST 
ON
        ST.stnID=T.stnID
	AND ST.month=T.month 
WHERE T.deg_C > -90.0 AND T.deg_C < 60.0 
;

show warnings;

Yearly By Country Anomaly Statistics

Here I tried to “bite the bullet” and combine the v3.3 and v4 data into one table. That means I get to load it with one and then UPDATE with the other. It has been nothing but a bother. UPDATE has been crazy slow. I’m been working on something faster and I think I have it now.

I frist wrote two programs (and attendant “views”) that let me update the Russia and Kazakhstan data with the combined Europe & Asia statistics. So it is entirely functional now with the older Lyrcastats loading step. It will also work with the new method.

The graphs so far, that include Russia and Kazakhstan in Asia, only have v3.3 data for Asia in the comparison while the v4 data is all of Russia and Kazakhstan (they divided them at about the Urals into 2 “countries” each in v3.3 and prior but combine them as one country each in v4).

I’ve divided the Lyrcastats process into 4 steps.

L4yrcastats
L3yrcastats  
LKZ  
LRS

The L4yrcastats program just loads the version 4 data.
The L3yrcastats program updates the version 3.3 data EXCEPT for Russia and Kazakhstan.
The LKZ program updates the Kazakhstan data using both the "KZ" and "K!" country abbreviations.
The LRS program updates the Kazakhstan data using both the "RS" and "R!" country abbreviations.

I've checked that all steps run without error and with data in the right places (and NULL where it belongs before that). I've not proven that the last two use all the Asian and European data, but can do that fairly quickly.

So let's look at how this works.  Fist, the table schema (layout).

Note the use of DECIMAL(6,2) field sizes as the change.

chiefio@PiM3Devuan2:~/SQL/tables$ cat yrcastats 
CREATE TABLE yrcastats (
    year   CHAR(4)      NOT NULL,
    abrev  CHAR(2)      NOT NULL,
    mean3  DECIMAL(6.2),
    mean4  DECIMAL(6.2) NOT NULL,
    big3   DECIMAL(6,2),
    big4   DECIMAL(6,2) NOT NULL,
    small3 DECIMAL(6,2),
    small4 DECIMAL(6,2) NOT NULL,
    num3   INTEGER,
    num4   INTEGER      NOT NULL,
    trang3 DECIMAL(6,2),
    trang4 DECIMAL(6,2) NOT NULL,
    stdev3 FLOAT,
    stdev4 FLOAT        NOT NULL,
    PRIMARY KEY (year,abrev)
    ) ;

Note that the KEY fields are the country abbreviation abrev, and the year. This table is for holding / computing anomaly statistics for each country for each year.

I run the L4yrcastats program first as it gets the table filled with the needed abrev values for the rest to work.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat L4yrcastats 
INSERT INTO  yrcastats (year,abrev,mean4,big4,small4,num4,trang4,stdev4)
SELECT year,abrev,
ROUND(AVG(deg_C),2),MAX(deg_C),MIN(deg_C),COUNT(deg_C), MAX(deg_C)-MIN(deg_C), ROUND(STDDEV(deg_C),2)
FROM anom4 
GROUP BY year,abrev
;
show warnings;

It is just the top bit of the old Lyrcastats program with “ROUND” functions added on the mean and standard deviation so that warnings of truncation went away.

Next I use a “VIEW” to load the GHCN v3.3 data WITHOUT loading Russia or Kazakhstan data. I’ve tested that, in fact, this loads the v3.3 data and both Russia and Kazakhstan have empty or NULL values

chiefio@PiM3Devuan2:~/SQL/tables$ cat vanom3
CREATE VIEW anom3Y AS
SELECT ROUND(AVG(A.deg_C),2)   AS 'meanA' ,
       MAX(A.deg_C)            AS 'maxA'  ,
       MIN(A.deg_C)            AS 'minA'  ,
       COUNT(A.deg_C)          AS 'numA'  ,
       MAX(A.deg_C)-MIN(deg_C) AS 'trangA'  ,
       STDDEV(A.deg_C)         AS 'stdevA' ,  
       Y.year                  AS 'yearA' ,
       Y.abrev                 AS 'abrevA'
FROM anom3 AS A
INNER JOIN yrcastats AS Y
ON    Y.year = A.year AND Y.abrev=A.abrev
WHERE A.abrev  'RS' AND A.abrev'R!'
 AND  A.abrev  'KZ' AND A.abrev'K!'
GROUP BY Y.year,Y.abrev

A couple of minor notes. First, I did the ROUND in the loading procedure (and it is still there) but got truncation warnings on “meanA” anyway. Adding the ROUND here stopped those. I’ve not checked to see if the one in the load program is now redundant.

Again I have a difference of the file name for the VIEW definition (vanom3) and the VIEW name (anom3Y). I’m thinking of changing that to be more consistent.

I put the computing of the various statistics in the VIEW along with the exclusion rules that leave out the KZ, K!, RS, and R! abbreviations / data for Kazak Asia, Kazak Europe, Russian Asia, and Russian Europe respectively.

I then run this program which after a quick check looks to be doing the right thing and is MUCH faster. About 10 minutes instead of 4 hours!

Note that the “ROUND” in the first mean3= line may no longer be needed as I added it to the VIEW. But it is working now so I ain’t touchin’ it! ;-0

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat L3yrcastats 
UPDATE  yrcastats AS Y
SET 
mean3 = (SELECT ROUND(A.meanA,2) 
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

big3 = (SELECT A.maxA
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

small3 = (SELECT A.minA
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

num3 = (SELECT A.numA
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

trang3 = (SELECT A.trangA
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

stdev3 = (SELECT A.stdevA
FROM anom3Y AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
)
;
show warnings;

Note that it now basically just picks up data from the “VIEW” and stuffs it into the table.

MariaDB [temps]> source bin/LOAD/L3yrcastats
Query OK, 23611 rows affected (10 min 0.27 sec)
Rows matched: 28423  Changed: 23611  Warnings: 0

I do need to do another “soup to nuts” complete rebuild just to verify that no step was left undomumented and it all plays well together, but is sure looks clean. I did do a ‘report’ from the table at this point and it looks like it ought to look. Russia has not data for v3.3 while others do.

MariaDB [temps]> SELECT meanA from anom3Y WHERE abrevA='RS';
Empty set (1 min 23.75 sec)

MariaDB [temps]> SELECT meanA FROM anom3Y WHERE abrevA="WA";
+-----------+
| meanA     |
+-----------+
| -0.419167 |
|  0.405833 |
| -0.669167 |
| -0.569167 |
[...]
|  2.904412 |
|  0.854839 |
|  1.495000 |
+-----------+
83 rows in set (1 min 24.40 sec)

As I’d previously validated the following “update Russia and Kazak” steps, I think that pretty much validates all of it (other than proving both RS and R! are used in the final update).

The Final Steps:

Due to MySQL / MariaDB having an unexpected limit (not in other SQLs) where it locks a table such that you can’t use it in a WHERE clause and update it at the same time, I had to make a “VIEW” to get these bits of update code to work. This is essentially a stored “SELECT” statement that presents what looks like a table made from some other tables. (I”m using the same technique for all the updates and it is faster).

So first the “view” then the load script. Note that this VIEW also has a ROUND in it:

chiefio@PiM3Devuan2:~/SQL/tables$ cat vanom3R
CREATE VIEW anom3R AS
SELECT ROUND(AVG(A.deg_C),2)   AS 'meanA'  ,
       MAX(A.deg_C)            AS 'maxA'   ,
       MIN(A.deg_C)            AS 'minA'   ,
       COUNT(A.deg_C)          AS 'numA'   ,
       MAX(A.deg_C)-MIN(deg_C) AS 'trangA' ,
       STDDEV(A.deg_C)         AS 'stdevA' ,  
       Y.year                  AS 'yearA'  ,
       Y.abrev                 AS 'abrevA'
FROM anom3 AS A
INNER JOIN yrcastats AS Y
   ON Y.year = A.year AND Y.abrev='RS'
WHERE A.abrev = 'RS' or A.abrev= 'R!'
GROUP BY Y.year

This is largely just a bypass of the MySQL limit so that I can use Y.abrev in the WHERE clause of the update by making it into “abrevA” in this “VIEW”. I also folded the computing of the Average, min, max etc. into the view, as, well, why not?

So this VIEW computes the statistics (AVE, MAX, MIN, COUNT, Total Range, Standard Deviation) and gives them field names like meanA, maxA etc. It also puts the yrcastats key field ‘year’ as Y.year into ‘yearA’ and it puts the yrcastats Y.abrev value into ‘abrevA’ so I can use them in the following update SQL program.

Notice the use of matching “RS” for the yrcastats file but using either “RS” or “R!” (the European 1/2 of Russia) for the anomaly data? Nice, eh? I just hope it is working right (no I’ve not yet done the ‘by hand run a sample all the way through the calculations and see if it matches’ QA step. Dreading it a bit, actually.

So I’m “pretty sure” this is working as expected, and it gave no error messages, but “Some QA Required” at this point. (Hey, this is only day 2 for that code, OK?) ;-)

It does this via a JOIN of the anomalies table, anom3, with the Yearly Country Anomaly Statistics table yrcastats, that at this point has had the v4 data loaded, and the v3 data loaded but with the RS “country’ only having the Asia portion of Russia in it for v3.3 data. This fixes that.

Here’s the UPDATE SQL. Yes, it is ugly. That seems to be a limitation of UPDATE. So for each field we must go through this shadow dance of SET the field to a value from an embedded SELECT statement. At least via using the VIEW we can just grab the finished computed value and move on.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat LRS
UPDATE  yrcastats AS Y
SET 
mean3 = (SELECT ROUND(A.meanA,2)
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

big3 = (SELECT A.maxA
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

small3 = (SELECT A.minA
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

num3 = (SELECT A.numA
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

trang3 = (SELECT A.trangA
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

stdev3 = (SELECT A.stdevA
FROM anom3R AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
)
WHERE abrev='RS'
;
show warnings;

Then the same thing but this one limiting updates to KZ Kazakhstan and using KZ and K! anomaly data. Again note the ROUND in the view makes the ROUND in the load load program likely a redundant action.

chiefio@PiM3Devuan2:~/SQL/tables$ cat vanom3K
CREATE VIEW anom3K AS
SELECT ROUND(AVG(A.deg_C),2)   AS 'meanA'  ,
       MAX(A.deg_C)            AS 'maxA'   ,
       MIN(A.deg_C)            AS 'minA'   ,
       COUNT(A.deg_C)          AS 'numA'   ,
       MAX(A.deg_C)-MIN(deg_C) AS 'trangA' ,
       STDDEV(A.deg_C)         AS 'stdevA' ,  
       Y.year                  AS 'yearA'  ,
       Y.abrev                 AS 'abrevA'
FROM anom3 AS A
INNER JOIN yrcastats AS Y
  ON Y.year = A.year AND Y.abrev='KZ'
WHERE A.abrev = 'KZ' or A.abrev= 'K!'
GROUP BY Y.year

Essentially identical but for swapping the country abbreviations.

Oh, and note I’m naming “VIEW”s with a leading v in the file name so I know it isn’t creating a real table. I likely ought to have carried that forward into the VIEW name itself, but decided to keep it the same as the table it was replacing (anom3) with just a letter to tell me which Country it was for.

chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat LKZ
UPDATE  yrcastats AS Y
SET 
mean3 = (SELECT ROUND(A.meanA,2)
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

big3 = (SELECT A.maxA
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

small3 = (SELECT A.minA
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

num3 = (SELECT A.numA
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

trang3 = (SELECT A.trangA
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
),

stdev3 = (SELECT A.stdevA
FROM anom3K AS A
WHERE Y.year=A.yearA AND Y.abrev=A.abrevA
)
WHERE abrev='KZ'
;
show warnings;

Basically the same thing as for RS but with “anom3K” as the table/ VIEW name.

In Conclusion

So there you have it. All the stuff needed to build the entire database with the exception of the unchanged FORTRAN. I’d put the FORTRAN in here too but then it would be way too long, and it is already posted. As time permits I’ll add some links back to where it is posted already.

Could this be done better? Certainly. I’m using the database like a series of procedural steps with intermediary files. Not the highest and best use. OTOH, it is fairly fast to develop this way, and then one knows all the issues and pitfalls and can go back for a clean re-implementation later. This is the “Rapid Prototype” phase.

I’ve found (many times – as has several companies) that doing a Rapid Prototype / Do-over is faster, cleaner, and better than doing a whole lot of analysis up front that then has to be re-done anyway when reality breaks your battle plan.

So don’t think of this as a “Done Deal”. It is only the “make some product you can ship now while you find out how it ought to be done” rapid phase. Feel free to do it over however you like for your own needs / uses.

FWIW I ran this all the way to the Lyrcastats step (that is 4 hours on the Pi so I skipped it this time) and had no errors and no warnings. I put that run in a comment here:

https://chiefio.wordpress.com/2019/05/14/updated-ghcn-load-scripts-reduced-warnings/#comment-112221

So as of now I’m done fooling around with these stages for a while. I’m going to proceed to making the Africa and Europe graphs.

So that’s what’s on deck next. Though perhaps after a bit of “garden time”. I’m reaching the “too many squash” phase and picked 1/2 dozen “8 Ball” squash yesterday. Can’t let that thing go a couple of days or you have too much squash and green beans ;-)

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

7 Responses to GHCN Version Anomaly Database Documentation

  1. cdquarles says:

    I have to agree with you. These folk mostly have never done chemistry; and of those that did, it was either high school or freshman chemistry 101. God forbid if there are Chemistry Studies classes.

  2. E.M.Smith says:

    @CDQuarles:

    The more I run into folks who claim to be Scientists but can’t even see basic errors of process, the more I appreciate my High School Chemistry teacher. Mr. McGuire. Retired Lt. Colonel from the Air Force (W.W.II) and retired Chemist from U.S. Steel.

    He was a “stickler” for certain things that, it turns out, are critical. Things like NEVER EVER changing a recorded data item. IF you had an error, you put ONE line through it, noted WHY in the notes, and recorded the replacement value under it. Erasures of any sort in the lab book were an immediate F grade. The constant re-writing of past data in “climate science” shows why they get an F.

    “Don’t touch the thermometers” was mentioned then, but reinforced at University. It was strongly stressed in the calorimetry unit. IIRC there was even a statement that just touching a thermometer could make a few 1/10 C error.. IF you thought you had it placed wrong in the reaction mix, you re-ran the experiment from the start; NO changes once in process.

    That the thermometers in use, in what is at it’s core a Global Calorimetry Experiment, are CONSTANTLY being touched, changed, re-calibrated, thrown out, homogenized, etc. etc. simply assures you can know NOTHING about the calorimetry. So also gets an F grade.

    What the temperature record is recording, and showing, is the change in INSTRUMENTS not climate.

  3. E.M.Smith says:

    FWIW, I’ve run Russia graphs again with the combined data. It is illuminating how much the anomaly data changes just from changing the particular thermometers in the record.

    I’m going to post it all up in a new posting a bit later, after I get Kazakhstan done too.

    It really “puts the lie” to the notion that just by using anomalies you can ignore instrument changes.

  4. cdquarles says:

    Indeed. IF there were “outliers” you left them in, marked them, and tried to account for them in a thorough as you could, error analysis and propagation.

  5. Pingback: GHCN v3.3 vs v4 Asia Anomaly | Musings from the Chiefio

  6. Pingback: What Difference Does Instrument Selection Make To Anomaly Processed Temperature Data? | Musings from the Chiefio

  7. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Comments are closed.