Just a quick “Hey, I’m half done!” on loading the detailed “inventory” file into the MySQL database. This is preliminary as there are a few numeric fields I’ve just loaded as type CHARACTER for now. I’ll need to cast them into numbers later / someday. (We saw that done with the temperature data earlier inside the FORTRAN data TAB insertion program for the Station Temperature Data). I’m thinking of making Latitude and Longitude as Spacial Data Type fields, but that’s a new concept for me. They are “Float” in the original description (though look like text in the file…) But for now I just wanted to get it in and kick it around.
So here’s the place where I picked up the original description for the V3 inventory file, the README from the data dump:
chiefio@odroidxu4:~/SQL/v3/ghcnm.v184.108.40.20650907$ vi ../README
2.1 METADATA The metadata has been carried over from GHCN-Monthly v2. This would include basic geographical station information such as latitude, longitude, elevation, station name, etc., and also extended metadata information, such as surrounding vegetation, etc. 2.1.1 METADATA FORMAT Variable Columns Type -------- ------- ---- ID 1-11 Integer LATITUDE 13-20 Real LONGITUDE 22-30 Real STNELEV 32-37 Real NAME 39-68 Character GRELEV 70-73 Integer POPCLS 74-74 Character POPSIZ 75-79 Integer TOPO 80-81 Character STVEG 82-83 Character STLOC 84-85 Character OCNDIS 86-87 Integer AIRSTN 88-88 Character TOWNDIS 89-90 Integer GRVEG 91-106 Character POPCSS 107-107 Character Variable Definitions: ID: 11 digit identifier, digits 1-3=Country Code, digits 4-8 represent the WMO id if the station is a WMO station. It is a WMO station if digits 9-11="000". LATITUDE: latitude of station in decimal degrees LONGITUDE: longitude of station in decimal degrees STELEV: is the station elevation in meters. -999.0 = missing. NAME: station name GRELEV: station elevation in meters estimated from gridded digital terrain data POPCLS: population class (U=Urban (>50,000 persons); (S=Suburban (>=10,000 and <= 50,000 persons); (R=Rural ( 100,000 persons, population data were provided by the United Nations Demographic Yearbook. For smaller cities and towns several atlases were uses to determine population. POPSIZ: the population of the city or town the station is location in (expressed in thousands of persons). TOPO: type of topography in the environment surrounding the station, (Flat-FL,Hilly-HI,Mountain Top-MT,Mountainous Valley-MV). STVEG: type of vegetation in environment of station if station is Rural and when it is indicated on the Operational Navigation Chart (Desert-DE,Forested-FO,Ice-IC,Marsh-MA). STLOC: indicates whether station is near lake or ocean (<= 30 km of ocean-CO, adjacent to a lake at least 25 square km-LA). OCNDIS: distance to nearest ocean/lake from station (km). AIRSTN: airport station indicator (A=station at an airport). TOWNDIS: distance from airport to center of associated city or town (km). GRVEG: vegetation type at nearest 0.5 deg x 0.5 deg gridded data point of vegetation dataset (44 total classifications). [ 44 types skipped -EMS] POPCSS: population class as determined by Satellite night lights (C=Urban, B=Suburban, A=Rural)
I tossed together a schema file with roughly the same nature, but added some fields for things like ascension and type:
root@odroidxu4:/SG2/ext/chiefio/SQL/Table.Schemas# cat inventory.sql CREATE TABLE inventory ( version CHAR(5) NOT NULL, ascension CHAR(10), type CHAR(4), region CHAR(1) NOT NULL, country CHAR(3) NOT NULL, wmo CHAR(5) NOT NULL, wno_near CHAR(3) NOT NULL, latitude CHAR(8), longitude CHAR(9), stn_elev CHAR(6), name CHAR(30) NOT NULL, grid_elev CHAR(4), pop_class CHAR(1), pop_size CHAR(5), topo_type CHAR(2), veg CHAR(2), prox_water CHAR(2), dist_water CHAR(2), airport CHAR(1), dist_A_to_U CHAR(2), veg_grid CHAR(16), pop_class_nitelites CHAR(1) ) ;
I tried to make a few of the names more understandable, and also used slightly different names in the FORTRAN that inserts TABs into the data so it will load. Yeah, now I’ve got 3 names for the same thing for some fields, bad Mikey… Cleanup on Isle Later… I’ve not put keys / index fields in place yet. I have a good idea what I want to do, but for this new table that’s still for later.
Here’s the FORTRAN that reads a line of their text file, adds tabs, and spits it back out. Note that I’ve left in a testing WRITE statement that just echos the line without tabs. Useful when doing the first debugging and you have off by one format problems… but it is commented out with a leading “C” in the first column:
root@odroidxu4:/SG2/ext/chiefio/SQL/v3# cat inven.f C FORTRAN to read the inventory files v3 GHCN file and insert Tabs C in the output. Also divides "country" into Continent and Country C C Variable declarations... C CHARACTER * 1 TAB CHARACTER * 5 VERSION CHARACTER * 10 ASCEN CHARACTER * 4 TYPE, GRIDELEV CHARACTER * 1 CONT,POPC,AIR,POPNL CHARACTER * 2 COUNTRY,TOPOT,VEG,PROXW,DISTW,DISTAU CHARACTER * 5 WMO,PSIZE CHARACTER * 3 NEAR CHARACTER * 6 STNELEV CHARACTER * 8 LATITUDE CHARACTER * 9 LONGITUDE CHARACTER * 16 VEGGRID CHARACTER * 30 NAME C C Set the TAB character TAB=CHAR(09) C C Set some constants VERSION="Gv3" ASCEN="7Sept2015" TYPE="QCU" C C Read in one line of data... C 9 OPEN(1, FILE='inventory.in', STATUS='OLD', ACTION='READ') 10 READ (1, 11, END=99) CONT, COUNTRY, WMO, NEAR,LATITUDE, & &LONGITUDE,STNELEV,NAME,GRIDELEV,POPC,PSIZE,TOPOT,VEG,PROXW, & &DISTW,AIR,DISTAU,VEGGRID,POPNL C 11 FORMAT (A1,A2,A5,A3,X,A8,X,A9,X,A6,X,A30,X,A4,A1,A5,A2,A2,A2, & &A2,A1,A2,A16,A1) C C Convert CHAR to Float C READ (T(I),*,END=20) F(I) C C Write out one line of data with TAB between fields C C WRITE (6, 6) VERSION,ASCEN,TYPE,CONT,COUNTRY,WMO,NEAR,LATITUDE, & C &LONGITUDE,STNELEV,NAME,GRIDELEV,POPC,PSIZE,TOPOT,VEG,PROXW,DISTW, & C &AIR,DISTAU,VEGGRID,POPNL C 6 FORMAT (A5,A10,A4,A1,A2,A5,A3,A8,A9,A6,A30,A4,A1,A5,A2,A2,A2, & C &A2,A1,A2,A16,A1) WRITE (6, 7) VERSION,TAB,ASCEN,TAB,TYPE,TAB,CONT,TAB,CONT,COUNTRY,& &TAB,WMO,TAB,NEAR,TAB,LATITUDE,TAB,LONGITUDE,TAB,STNELEV,TAB,NAME, & &TAB,GRIDELEV,TAB,POPC,TAB,PSIZE,TAB,TOPOT,TAB,VEG,TAB,PROXW,TAB, & &DISTW,TAB,AIR,TAB,DISTAU,TAB,VEGGRID,TAB,POPNL 7 FORMAT (A5,A1,A10,A1,A4,A1,A1,A1,A1,A2,A1,A5,A1,A3,A1,A8,A1,A9,A1,& &A6,A1,A30,A1,A4,A1,A1,A1,A5,A1,A2,A1,A2,A1,A2,A1, & &A2,A1,A1,A1,A2,A1,A16,A1,A1) C Retrieve another line of data... C GO TO 10 C C If end of file, then stop. C 99 STOP END
Yeah, almost exactly like the others just with different READ / WRITE lists and FORMAT statements to match… and were made by taking a former one and changing the READ / WRITE lists and FORMAT stations to match this data file…
Here’s what it looks like after a “SOURCE inventory.sql” command:
mysql> describe inventory; +---------------------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------------+----------+------+-----+---------+-------+ | version | char(5) | NO | | NULL | | | ascension | char(10) | YES | | NULL | | | type | char(4) | YES | | NULL | | | region | char(1) | NO | | NULL | | | country | char(3) | NO | | NULL | | | wmo | char(5) | NO | | NULL | | | wno_near | char(3) | NO | | NULL | | | latitude | char(8) | YES | | NULL | | | longitude | char(9) | YES | | NULL | | | stn_elev | char(6) | YES | | NULL | | | name | char(30) | NO | | NULL | | | grid_elev | char(4) | YES | | NULL | | | pop_class | char(1) | YES | | NULL | | | pop_size | char(5) | YES | | NULL | | | topo_type | char(2) | YES | | NULL | | | veg | char(2) | YES | | NULL | | | prox_water | char(2) | YES | | NULL | | | dist_water | char(2) | YES | | NULL | | | airport | char(1) | YES | | NULL | | | dist_A_to_U | char(2) | YES | | NULL | | | veg_grid | char(16) | YES | | NULL | | | pop_class_nitelites | char(1) | YES | | NULL | | +---------------------+----------+------+-----+---------+-------+ 22 rows in set (0.01 sec) mysql>
Has 7280 lines in it:
mysql> SELECT COUNT(wmo) FROM inventory; +------------+ | COUNT(wmo) | +------------+ | 7280 | +------------+ 1 row in set (0.05 sec)
Of which 2379 currently have AIRPORT designation:
mysql> SELECT COUNT(wmo) FROM inventory WHERE airport='A'; +------------+ | COUNT(wmo) | +------------+ | 2379 | +------------+ 1 row in set (0.02 sec)
As you can see, even with this preliminary format there’s things you can report from the data ;-)
I do want to get the numeric data types cast into their proper forms, then put on a key / index set. Also I just did a “diff” in Linux Land on the QCA and QCU file versions and they were the same, so I’m going to do that same test on the rest of the V3 inventory copies. IFF all of them match and the V2 match (as they claim) then the utility of loading them is zero. I know the v1 inventory file is different, so it will need some different fields. It might make sense to just toss it into a dedicated table (and lose the ascension and type fields from this table) and then do the compare and contrast between the two tables. Like I said, this is preliminary…
Well, I think that covers it for tonight. A basic “load table” done.
Oh, FWIW, to compile the FORTRAN on Linux, of the Debian / Devuan sort, you just install the FORTRAN compiler with “apt-get install gfortran” and then do a “gfortran inven.f” to make the a.out executable file. Pretty darned quick and easy really. (Of course you can rename a.out to anything, so like “inventory_tab_monster” or whatever…)
At this point I’ve got a workable if crude layout for all the basic data tables and a demonstration load from one instance of the V3 data. Next steps are just a LOT of polish to make the database layout more efficient / faster / cleaner; and then load more data instances from V2 and V3, and then take on that integration load of the significantly different v1. (Then think about USHCN & Hadley, maybe…)
Along the way working up some interesting “compare and contrast reports” and such. Also I need to lean how to make MySQL calls from inside programs (like C, Python, whatever…) so I can use those tools on this data if needed.
The point being that there’s a long tune, test, extend, and use path stretching to the horizon now. The really nice thing is that now all sorts of “ad hoc queries” become very easy to do! So now I need to think about what I want to kick around ;-) Things like “how many thermometers per country?” and “What is the highest elevation thermometer in each country” become relatively easy to answer. Any ideas welcome ;-)