GHCN V3 Temperature Data Loaded – MySQL

This is definitely not the final database table structure. This is just “working out the kinks” and finding out what the right one needs to be. What we called “Rapid Prototyping” at work. Essentially you build it really really fast, then find out what’s wrong, then rebuild it right. Turns out that is a faster way to a better product than months (or years…) of “Analysis Paralysis”.

So I’ve managed to load my first set of full on temperature data into a MySQL table. This is the V3 data of a non-adjusted sort.

I still have not decided on index / sort keys, so all this is going to be slower than you might expect. First one needs to un-tar the archive tar file. At first I forgot to do this step (do to a naming bumble on my part when gunziping the file). Rapidly discovered a couple of binary records that caused FORTRAN to barf on the “convert CHAR to FLOAT” for temperature data. Looked at the file and had that DOH! moment… of course it is a tar file…

One un-tar later I also found where the detail inventory file was hiding… end of the same tar blob. So I can now finish my station data / inventory table bit too. But first I wanted to just get some data load happening so I could see if this would be a horrible dog needing tuning, or was OK enough to play with.

In the file, a given instrument for a given year has a block of 12 months of data with flags for quality issues. My initial thought was 12 fields with names like Jan Feb Mar and Jan_flag1 Jan_flag2… Then thought better of it. Realistically this is space by time structured data. So I’ve made the structure one that uses space (WMO number / location stuff) and then has YEAR and MONTH fields, then just a temperature and the infoflags. This really “blows up” the input file in the conversion, but some of that can be taken back out in the database load. I need to learn how best to tell MySQL that a given field or set of fields are the index fields so they are not repeated in every record… or maybe it already knows that… for now I think I’m just loading it as a giant flat file structure and far from ideal; as it doesn’t know what’s a key yet.

With that preamble, here’s the table description:

chiefio@odroidxu4:~/SQL/Table.Schemas$ cat temps_data.sql 
CREATE TABLE temps_data (
    version CHAR(5) NOT NULL,
    ascension CHAR(10),
    region CHAR(1) NOT NULL,
    country CHAR(3) NOT NULL,
    wmo CHAR(5) NOT NULL,
    near_wmo CHAR(3),
    year CHAR(4),
    month CHAR(4),
    type CHAR(4) NOT NULL,
    deg_c DECIMAL(5.3),
    missing  CHAR(1),
    qc  CHAR(1),
    source  CHAR(1)
    ) ;

I’m using “near_wmo” for the 3 digits used to flag instruments near a WMO site, but not actually the WMO site. “type” is picked up from the tape of data and is TAVE, TMIN, or TMAX. For now I’ve just loaded TAVG. I convert their integer text into decimal degrees. They give it as 1/100 C precision (which is ludicrous) but I’ve carried it forward that way. DECIMAL data type holds the digits constant. FLOAT just makes it a binary float so I avoided that. Then the 3 flags for degree of missing data, the QC level / issues, and the source.

Realize this is very likely to change as I look at differences in V2 and V1 data structure from my archive set.

It will certainly change as I start exploring what ought to be index fields / key fields.

In the process of molesting the compact data file into a tab separated set of stuff that MySQL will swallow, it grows. Here’s the two sizes:

root@odroidxu4:/SG2/ext/chiefio/SQL/v3# ls -l temps.*
-rw-r--r-- 1 chiefio chiefio  53690600 Jan 20 20:39 temps.in
-rw-r--r-- 1 chiefio chiefio 344360400 Jan 20 20:40 temps.u.in

So from about 53 MB to 344 MB. Since I repeat all the station data elements by year for each month for each tabbed set of values, no real surprise. These are temp files anyway and disposed after the data load. The following FORTRAN does the conversion. It’s a bit of rough code… I was having issues with the data format and ended up with a couple of “diagnostic write” episodes where I cut things back and changed them… Eventually found I’d left out a trailing “,” after an A1 at the EOL continuation character in a FORMAT statement… Putting that comma in, everything got back in sync nicely. Then I decided to change from printing the whole array with an index loop to doing one month with a name… And just stuck in 12 brute force WRITE statements. This can be made far more elegant and compact. But I’m done for the day… It works, I’ll come back to it “some other day” to make it look nice… So doing things like using loops instead of specifying the array indexes as literal integers.

The only really interesting thing about it is the conversion of the CHAR temperature read data into FLOATs via an internal read loop (see the DO 20).

The “6 FORMAT” was when I cut things back to just the data items, no tabs, and was seeing normal stuff (so knew the READ and variable stuffing was OK) while the “7 FORMAT” was when I was printing out the arrays as looping arrays and finding it very ugly with oodles of tabs in it and pondering that ugly in a DB table… Left in at this point as this is very beta and I might want to look at that stuff again in a few days. Commented out, though, with a C in the first column.

I also need to do my usual “Data Dictionary” comments describing the variables. For now, just know that T(1:12) is the character temperature data as read in, and F(1:12) is the FLOAT temperature data after conversion. The other things are named fairly clearly. Oh, and M, Q, and S are the flags for Missing data Qc level and Source. I went with single character names to make all those long READ and WRITE statements shorter ;-)

root@odroidxu4:/SG2/ext/chiefio/SQL/v3# cat stationdat.f 
C FORTRAN to read the station data files v3 GHCN file and insert Tabs
C in the output.  Also divides "country" into Continent and Country
C
C Variable declarations...
C
      CHARACTER * 1 TAB
      CHARACTER * 1 CONT
      CHARACTER * 2 COUNTRY
      CHARACTER * 5 WMO
      CHARACTER * 3 MOD
      CHARACTER * 4 YEAR
      CHARACTER * 4 TYPE
      CHARACTER * 5 VERSION
      CHARACTER * 10 ASCEN
      CHARACTER * 1 M(1:12), Q(1:12), S(1:12)
      CHARACTER * 5 T(1:12)

      REAL F(1:12)
C
C Set the TAB character
      TAB=CHAR(09)
C
C Set some constants
      VERSION="Gv3"
      ASCEN="7Sept2015"
C
C Read in one line of data...
C
    9 OPEN(1, FILE='temps.in', STATUS='OLD', ACTION='READ')
   10 READ (1, 11, END=99) CONT, COUNTRY, WMO, MOD, YEAR, TYPE,         &
     &T(1),M(1),Q(1),S(1),T(2),M(2),Q(2),S(2),T(3),M(3),Q(3),S(3),      &
     &T(4),M(4),Q(4),S(4),T(5),M(5),Q(5),S(5),T(6),M(6),Q(6),S(6),      &
     &T(7),M(7),Q(7),S(7),T(8),M(8),Q(8),S(8),T(9),M(9),Q(9),S(9),      &
     &T(10),M(10),Q(10),S(10),T(11),M(11),Q(11),S(11),T(12),M(12),Q(12),&
     &S(12)
C
   11 FORMAT (A1,A2,A5,A3,A4,A4,12(A5,A1,A1,A1))
C
C Convert CHAR temp to Float Temp and adjust decimal point
      DO 20, I=1,12
         READ (T(I),*,END=20) F(I)
         F(I)=F(I)/100.0
   20 CONTINUE
C  The Do loop from above
C
C Write out one line of data with TAB between fields
C
C      WRITE (6, 6) VERSION,ASCEN,CONT,COUNTRY,WMO,MOD,YEAR,TYPE,        &
C     &(F(I), I=1,12), (M(I), I=1,12), (Q(I), I=1,12),(S(I),I=1,12) 

C      WRITE (6, 7) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
C     &WMO,TAB, MOD,TAB, YEAR,TAB, TYPE,TAB, (F(I), I=1,12),TAB,         &
C     &(M(I), I=1,12),TAB, (Q(I), I=1,12),TAB, (S(I),I=1,12),TAB 

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "JAN",TAB, TYPE,TAB, F(1),TAB,        &
     &M(1),TAB, Q(1),TAB, S(1)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "FEB",TAB, TYPE,TAB, F(2),TAB,        &
     &M(2),TAB, Q(2),TAB, S(2)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "MAR",TAB, TYPE,TAB, F(3),TAB,        &
     &M(3),TAB, Q(3),TAB, S(3)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "APR",TAB, TYPE,TAB, F(4),TAB,        &
     &M(4),TAB, Q(4),TAB, S(4)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "MAY",TAB, TYPE,TAB, F(5),TAB,        &
     &M(5),TAB, Q(5),TAB, S(5)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "JUNE",TAB, TYPE,TAB, F(6),TAB,        &
     &M(6),TAB, Q(6),TAB, S(6)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "JULY",TAB, TYPE,TAB, F(7),TAB,        &
     &M(7),TAB, Q(7),TAB, S(7)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "AUG",TAB, TYPE,TAB, F(8),TAB,        &
     &M(8),TAB, Q(8),TAB, S(8)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "SEPT",TAB, TYPE,TAB, F(9),TAB,        &
     &M(9),TAB, Q(9),TAB, S(9)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "OCT",TAB, TYPE,TAB, F(10),TAB,       &
     &M(10),TAB, Q(10),TAB, S(10)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "NOV",TAB, TYPE,TAB, F(11),TAB,       &
     &M(11),TAB, Q(11),TAB, S(11)

      WRITE (6, 8) VERSION,TAB, ASCEN,TAB, CONT,TAB, CONT,COUNTRY,TAB,  &
     &WMO,TAB, MOD,TAB, YEAR,TAB, "DEC",TAB, TYPE,TAB, F(12),TAB,       &
     &M(12),TAB, Q(12),TAB, S(12)

C    6 FORMAT (A5,A10,A1,A2,A5,A3,A4,A4,12(F7.1),12(A1),12(A1),12(A1))

C    7 FORMAT (A5,A1, A10,A1, A1,A1, A1,A2,A1, A5,A1, A3,A1, A4,A1, A4,  &
C     &A1,12(F7.1),A1,12(A1),A1,12(A1),A1,12(A1),A1)

    8 FORMAT (A5,A1, A10,A1, A1,A1, A1,A2,A1, A5,A1, A3,A1, A4,A1, A4,  &
     &A1,A4,A1,F7.2,A1,A1,A1,A1,A1,A1)

C Retrieve another line of data...
C
      GO TO 10
C
C If end of file, then stop...
C
   99 STOP
      END

When run, it took a while to complete. Had the Odroid XU4 running about 20% CPU for about a minute. It was disk access limited the whole time (which could be improved with the input and output files on different disk spindles so it isn’t forced to seek back and forth between them). The database load also took a while. FORTAN compiles by default put the executable in a file named a.out and I did not bother renaming it for the testing cycle. Knowing I’d copied the “unadjusted” data into the input file of temps.in, I named the output with a “u” in it as temps.u.in:

chiefio@odroidxu4:~/SQL/v3$ time ./a.out > temps.u.in

real	0m55.210s
user	0m52.132s
sys	0m2.921s

So it took about a minute (55 seconds) elapsed time to convert the tar file data into the format for DB loading. I note in passing that this input file chunk shows the temperatures as correctly converted to xx.yy floating point and with -99.99 as the missing data flag value. It is during the loading of this into the table that those get changed.

mysql> source temps_data.sql
Query OK, 0 rows affected (0.12 sec)chiefio@odroidxu4:~/SQL/v3$ tail temps.u.in 
Gv3  	7Sept2015 	8	800	99914	001	1971	 MAR	TAVG	  14.90	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 APR	TAVG	  16.50	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 MAY	TAVG	 -99.99	 	 	 
Gv3  	7Sept2015 	8	800	99914	001	1971	JUNE	TAVG	  20.60	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	JULY	TAVG	  23.20	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 AUG	TAVG	  23.20	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	SEPT	TAVG	  24.50	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 OCT	TAVG	  22.10	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 NOV	TAVG	  20.00	 	 	G
Gv3  	7Sept2015 	8	800	99914	001	1971	 DEC	TAVG	  17.00	 	 	G

Here’s the table description:

mysql> show tables
    -> ;
+-----------------+
| Tables_in_temps |
+-----------------+
| continent       |
| country         |
| inventory       |
| temps_data      |
+-----------------+
4 rows in set (0.00 sec)

mysql> describe temps_data;

+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| version   | char(5)      | NO   |     | NULL    |       |
| ascension | char(10)     | YES  |     | NULL    |       |
| region    | char(1)      | NO   |     | NULL    |       |
| country   | char(3)      | NO   |     | NULL    |       |
| wmo       | char(5)      | NO   |     | NULL    |       |
| near_wmo  | char(3)      | YES  |     | NULL    |       |
| year      | char(4)      | YES  |     | NULL    |       |
| month     | char(4)      | YES  |     | NULL    |       |
| type      | char(4)      | NO   |     | NULL    |       |
| deg_c     | decimal(5,0) | YES  |     | NULL    |       |
| missing   | char(1)      | YES  |     | NULL    |       |
| qc        | char(1)      | YES  |     | NULL    |       |
| source    | char(1)      | YES  |     | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
13 rows in set (0.01 sec)

mysql> LOAD DATA LOCAL INFILE '../v3/temps.u.in' INTO TABLE temps_data ;
Query OK, 5554200 rows affected, 65535 warnings (3 min 30.65 sec)
Records: 5554200  Deleted: 0  Skipped: 0  Warnings: 5182646

So 3 minutes to load the database. It being on a different partition of the same disk as my scratch space it would be taking long seeks… Then there’s just the point that it’s got 5.5 Mega-Rows of data… The 65 k warnings has me wondering, but I’ll get to that tomorrow too. Likely something like the -9999 missing data flags getting turned into -99.99 and that causing something heartburn… or maybe some other field “off by one”. By now you ought to know the drill on first data load and stamping out warnings ;-)

Don’t know why the DESCRIPTION has 5.0 when the schema as 5.3 for the DECIMAL value. A minor “Dig Here” bug / issue / documentation item. ;-) A report shows that the field displays without a decimal portion, so maybe I need to change it to some other data size / type, or just find out how to do it right ;-)

mysql> SELECT * FROM temps_data WHERE wmo='60355';
+---------+-----------+--------+---------+-------+----------+------+-------+------+-------+---------+------+--------+
| version | ascension | region | country | wmo   | near_wmo | year | month | type | deg_c | missing | qc   | source |
+---------+-----------+--------+---------+-------+----------+------+-------+------+-------+---------+------+--------+
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  JAN  | TAVG |     9 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  FEB  | TAVG |    10 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAR  | TAVG |    11 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  APR  | TAVG |    16 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAY  | TAVG |    20 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JUNE  | TAVG |    22 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JULY  | TAVG |    25 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  AUG  | TAVG |    27 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | SEPT  | TAVG |    23 |         |      | 1      |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  OCT  | TAVG |    27 |         | S    | 1      |
...
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | SEPT  | TAVG |  -100 |         |      |        |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  OCT  | TAVG |  -100 |         |      |        |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  NOV  | TAVG |  -100 |         |      |        |
| Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  DEC  | TAVG |  -100 |         |      |        |
+---------+-----------+--------+---------+-------+----------+------+-------+------+-------+---------+------+--------+
768 rows in set (14.43 sec)
mysql> 

14 seconds ain’t bad for a report searching the whole table. On a re-run it claims 0.0 seconds so I think it’s holding the data in memory after a query is run. As there are far more than 65,000 temperature items, and they are all truncated with 0 decimal fractions, that means there is likely some other issue that has caused the warnings. All the other data items look about right, though, so not sure what it would be at this point.

There’s a lot of temps being reported as -100 C, but no -999 or -99 empty data flags reported. I’m suspecting that’s showing me where I’ve got the issue…

...
|  -100 |
|  -100 |
|  -100 |
|  -100 |
|  -100 |
+-------+
256704 rows in set (11.47 sec)

mysql> SELECT deg_c FROM temps_data WHERE deg_c SELECT deg_c FROM temps_data WHERE deg_c<'-100';
Empty set (11.40 sec)

So first I’m going to tinker around with the schema and see if I can get it to proclaim the 5.3 I thought I’d specified. Then I’ll try again. At this point I’m thinking maybe those -100 are really the -99.99 “missing data flags” and that the reference I read which claimed a 5.3 would cover from -999.99 to 999.99 was just wrong. But we’ll see.

For now I’m happy that in general the data conversion and loading look fairly fast and certainly fast enough. A minute or 3 is not a big deal. Then a few teens of seconds for a report? No problem. That all the OTHER data items have loaded correctly is highly encouraging ;-) Worst case is I set deg_c to being a FLOAT and just live with the binary jitter in the precision.

In Conclusion

MySQL is turning out to be fairly easy to learn and use. Yes, I’ll have more “fiddling about” until I get it to just as I like it; but it really is fast and easy to get the data into a database. Mostly just the usual “lost a comma” and “off by one” and minor alignment / formatting disconnects that seem to show up in any data conversion operation.

So that’s it for tonight. Time for a “reward fudge brownie” and then some sleep time ;-) The cleaning up and polishing can wait for tomorrow…

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues and tagged , . Bookmark the permalink.

12 Responses to GHCN V3 Temperature Data Loaded – MySQL

  1. E.M.Smith says:

    Hit post and then couldn’t resist just one more thing…. And the answer is…

    It’s supposed to be a COMMA not a PERIOD… in the table description. After changing 5.3 to 5,3 we have:

    mysql> describe temps_data;
    +-----------+--------------+------+-----+---------+-------+
    | Field     | Type         | Null | Key | Default | Extra |
    +-----------+--------------+------+-----+---------+-------+
    | version   | char(5)      | NO   |     | NULL    |       |
    | ascension | char(10)     | YES  |     | NULL    |       |
    | region    | char(1)      | NO   |     | NULL    |       |
    | country   | char(3)      | NO   |     | NULL    |       |
    | wmo       | char(5)      | NO   |     | NULL    |       |
    | near_wmo  | char(3)      | YES  |     | NULL    |       |
    | year      | char(4)      | YES  |     | NULL    |       |
    | month     | char(4)      | YES  |     | NULL    |       |
    | type      | char(4)      | NO   |     | NULL    |       |
    | deg_c     | decimal(5,3) | YES  |     | NULL    |       |
    | missing   | char(1)      | YES  |     | NULL    |       |
    | qc        | char(1)      | YES  |     | NULL    |       |
    | source    | char(1)      | YES  |     | NULL    |       |
    +-----------+--------------+------+-----+---------+-------+
    13 rows in set (0.00 sec)
    
    mysql> 
    

    Showing “decimal(5,3)…

    So now I can re-run the load and such and it will likely be correct now. One Stupid Pixel… (and me not bothering with “readers”…)

  2. rms says:

    I know you are into FORTRAN (my first foray into computing at uni and into corporate world). But you may want to look at using Python (with Django to connect to the database and or Pandas to do the number crunching. Very impressive software for this sort of thing).

  3. E.M.Smith says:

    And we’re down to just 5 warnings on the data load:

    mysql> LOAD DATA LOCAL INFILE '../v3/temps.u.in' INTO TABLE temps_data;
    Query OK, 5554200 rows affected, 5 warnings (2 min 58.92 sec)
    Records: 5554200  Deleted: 0  Skipped: 0  Warnings: 5
    
    mysql> 
    

    A quicky report looks right too:

    mysql> select * from temps_data where wmo='60355'
        -> ;
    +---------+-----------+--------+---------+-------+----------+------+-------+------+---------+---------+------+--------+
    | version | ascension | region | country | wmo   | near_wmo | year | month | type | deg_c   | missing | qc   | source |
    +---------+-----------+--------+---------+-------+----------+------+-------+------+---------+---------+------+--------+
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  JAN  | TAVG |   8.900 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  FEB  | TAVG |   9.500 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAR  | TAVG |  11.100 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  APR  | TAVG |  16.100 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAY  | TAVG |  19.800 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JUNE  | TAVG |  22.400 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JULY  | TAVG |  24.900 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  AUG  | TAVG |  26.800 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | SEPT  | TAVG |  23.200 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  OCT  | TAVG |  26.800 |         | S    | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  NOV  | TAVG |  13.700 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  DEC  | TAVG |  11.500 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1879 |  JAN  | TAVG |  11.800 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1879 |  FEB  | TAVG |  11.700 |         |      | 1      |
    ...
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  MAR  | TAVG |  14.400 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  APR  | TAVG |  16.700 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  MAY  | TAVG |  20.600 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | JUNE  | TAVG |  23.200 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | JULY  | TAVG |  26.100 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  AUG  | TAVG | -99.990 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | SEPT  | TAVG | -99.990 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  OCT  | TAVG | -99.990 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  NOV  | TAVG | -99.990 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  DEC  | TAVG | -99.990 |         |      |        |
    +---------+-----------+--------+---------+-------+----------+------+-------+------+---------+---------+------+--------+
    768 rows in set (13.71 sec)
    
    mysql> 
    

    Though that missing data flag looks like I need to make it 5.2 instead… that third decimal fraction digit is bogus…

    But at least now I know what’s going on.

  4. rms says:

    Forgot to mention that these Python tools work well with MySQL database.

  5. E.M.Smith says:

    Yup, that got it!

    mysql> SELECT * FROM temps_data WHERE wmo="60355";
    +---------+-----------+--------+---------+-------+----------+------+-------+------+--------+---------+------+--------+
    | version | ascension | region | country | wmo   | near_wmo | year | month | type | deg_c  | missing | qc   | source |
    +---------+-----------+--------+---------+-------+----------+------+-------+------+--------+---------+------+--------+
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  JAN  | TAVG |   8.90 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  FEB  | TAVG |   9.50 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAR  | TAVG |  11.10 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  APR  | TAVG |  16.10 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 |  MAY  | TAVG |  19.80 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JUNE  | TAVG |  22.40 |         |      | 1      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 1878 | JULY  | TAVG |  24.90 |         |      | 1      |
    ...
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  FEB  | TAVG |  11.00 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  MAR  | TAVG |  14.40 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  APR  | TAVG |  16.70 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  MAY  | TAVG |  20.60 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | JUNE  | TAVG |  23.20 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | JULY  | TAVG |  26.10 |         |      | K      |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  AUG  | TAVG | -99.99 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 | SEPT  | TAVG | -99.99 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  OCT  | TAVG | -99.99 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  NOV  | TAVG | -99.99 |         |      |        |
    | Gv3     | 7Sept2015 | 1      | 101     | 60355 | 000      | 2015 |  DEC  | TAVG | -99.99 |         |      |        |
    +---------+-----------+--------+---------+-------+----------+------+-------+------+--------+---------+------+--------+
    768 rows in set (12.81 sec)
    
    mysql> 
    

    Final, correct, schema file:

    root@odroidxu4:/SG2/ext/chiefio/SQL/Table.Schemas# cat temps_data.sql 
    CREATE TABLE temps_data (
        version CHAR(5) NOT NULL,
        ascension CHAR(10),
        region CHAR(1) NOT NULL,
        country CHAR(3) NOT NULL,
        wmo CHAR(5) NOT NULL,
        near_wmo CHAR(3),
        year CHAR(4),
        month CHAR(4),
        type CHAR(4) NOT NULL,
        deg_c DECIMAL(5,2),
        missing  CHAR(1),
        qc  CHAR(1),
        source  CHAR(1)
        ) ;
    

    OK, so now I can play around with some reporting things, and move on to pondering that whole question of index fields…

  6. E.M.Smith says:

    @rms:

    I’m only really “into FORTRAN” for this as they used FORTRAN to write the data files in the first place so it’s all very FORTRAN style. (Fixed format et. al.). They also provide FORTRAN exemplar programs to read the data files, so the hard bit is done.

    Per Python in general:

    I’ve looked at it. It looks like a reasonable language in general. I do have one part of it that causes me annoyance. That whole “position matters” thing. It really isn’t THAT hard to deal with BEGIN; END: pairs or {} pairs… Then it is explicit. Cut / Paste some Python code into a posting and some “helpful” software changes your white space alignment AND your program… I’m not keen on that.

    I had that happen when I posted the bits of Python in GIStemp (when I was porting it…) and had to go back and “fix it up” manually.

    Left a bit of a bad taste in my mouth about position dependent code…

    I could likely get over it if I had enough reason, but for a simple “read a line, add tabs, write a line” I’m not feeling the need to learn a whole ‘nother language. (Though I did manage to learn enough of it to get the bit in GIStemp ported and running ;-)

    I’ve actually done more in C than FORTRAN… but were I forced to pick a language that is most aligned with my preferences, it would be ALGOL. Yeah, way “old school” block structured language. Big ol’ BEGIN and END markers… But it begat so many others, like Pascal, C, ADA, etc. etc. (just about everything with a BEGIN END pair style) that I have a soft spot for it ;-) It also got me hooked on the idea that the physical layout could be anything as long as the blocks were written correctly, so one of the fun things to do was try putting a whole program on one line ;-)

    PL/1 lets you write something very much like ALGOL in it. (Or like FORTRAN or like COBOL or like…;-) If you look at a program someone wrote in PL/1, you can see what language they learned first in the style. A massive language…

    Pascal was like ALGOL with a straight jacket included…

    C was an ALGOL derivative with hand grenades and dead falls included ;-)

    Maybe it warped me for life… It was my second programming language…

    Which reminds me… I have an Algol 68 or Algol W compiler tarball laying around somewhere for the Pi that I’ve never installed…

    FWIW, I’m not resistant to new languages. It’s just that I’ve already had to work in so many languages that I find it hard to get excited about Yet Another Language. The whole “Oooohhhh, look at the shiny thing!! Just invest a few months of your life learning Yet Another Language. This time for SURE!!!” has a modest cringe reaction. Too many times shoving my brains through that particular sieve I guess ;-)

    At this point, I doubt I could even manage to list all the languages I’ve used at one time or another… (though Python is one of them, in a limited “port it and get it to run” way; so not ‘write from scratch’).

    I was actually keen on Perl, but it kept changing… Then wasn’t ‘trendy” any more and others claimed the stage. After a while they start to blur into a kind of generic pseudo-code.

    One other one I was keen on was FORTH. I really like threaded interpreted languages. (Shell scripting is one, BTW). I tend to build a dictionary of my own “words” in shell commands on any given system. You can do nearly anything with shell and C… But I don’t do much down at the bit twiddle level any more, so FORTH has been a couple of decades fading… Like Reverse Polish Notation for words ;-)

    I wrote an APL program once. The next day I couldn’t read it… No lie. “Write only programming” with bizzaro symbols.

    So yeah, I’ve had the Language Hots before… But I think I’ve gotten over it now ;-)

  7. rms says:

    Frankly, it’s Pandas that I’m suggesting you take a look at if you want to do data analysis (which is what I *think* you are heading into). Yes, uses Python. Very powerful. Worth the investment in time, IMHO. Stood the test of time with me. http://pandas.pydata.org

  8. E.M.Smith says:

    Looked over the web site and the wiki. Does serve the problem domains that interest me. I’d kinda settled on R as the one to do, but maybe this needs a head to head comparison…

    FWIW, I’m not planning a whole lot of heavy duty manipulation on the temperature data. Mostly just some simple “What changed?” stuff and a few “other ways to look at it” than homogenized & blended…

    I did a bunch of that kind of thing using just FORTRAN and flat files a few years back. Was thinking I’d put it in a DB and see if I can shake it around in R (mostly for the built in graphing…).

    This present effort is mostly just playing with shoving it into a DB, seeing what looks weird, and working out a decent data structure for general purpose compares. Things like “how many data items by WMO have been adjusted warmer vs cooler in the 1st half vs 2nd half between versions?”

    This isn’t a full time effort, so it will be “catch as catch can” on progress. So far it’s gone fast though…

  9. rms says:

    Willis uses R effectively. I’ve given R a good go (experiments, read the docs and a book or two) in curioiusity. I found the language too unstructured and hard to use–inconsistencies that I can’t get my head around. I’m not pursuing it, but I do what the blogs to see what people are doing with it. Certainly in use and surely has lots of support. But so does Pandas. Pandas (with it’s close links to Matplotlib) does a lot of high-end graphing pretty well. Pandas stands on Numpy and Scipy, so has that power. Lots of hits when searching for R vs. Pandas (or Python). Enjoy.

  10. rms says:

    As a start to see differences… https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.diff.html

    Seems pretty straight forward.

    (mind the typos above. Too early in the morning)

  11. A C Osborn says:

    EM, do you think the “Scientists” at NASA/GISS are getting nervous yet.
    You are getting all the History together to show just what they have been doing, if I was them I certainly would.

  12. E.M.Smith says:

    @A.C.Osborn:

    They have been told they have “Top Cover”. Worst they have to worry about is an appointment to a University somewhere, or a UN Commission, all at a major salary increase.

    Look at where the dregs of the Obama Whitehouse ended up – running the University Of California system at a giant salary of somewhere over $400,000 / year… That’s what you get for loosing…

    So no, I don’t think they are worried at all. Until some of them go to prison for fraud or related, they won’t worry at ll. Did anyone “go down” over Climategate and emails stating they were cooking the books and using Pal Review not Peer Review to suppress the truth? Still waiting…

    There’s a huge inventory of A/B comparison graphs done by all sorts of folks over the years showing the cooling of the past and warming of the present in the data series. It’s already out there, and nothing has happened. Most this does is make it a bit easier to recreate that stuff AND adds v4 to the mix “someday”.

    That said, I might stumble on something new that wasn’t figured out yet. I doubt it, but I might.

    The essential problem is that the “majic sauce” is all in the adjustments and the homogenizing and “Climate scientists” are happy to spout “justifications” for it. They think The Story is golden and a done deal. Then, unless you have a peer reviewed article showing where it is bogus (not just a demonstration of fact…) they can just say you don’t matter. Which is what they do. Now I’m never going to make it through the peer review process, as they ARE the peer review process; and I’m unlikely to try anyway. But I can demonstrate anything interesting and hope someone else takes on that bit of the work. So we’ll see what we find – when we find it.

    Frankly, if the last couple of years of cold and this frozen winter don’t get folks polishing their pitchforks and warming the tar, not much will. It’s frozen from N.America to Europe and we STILL have “news” stores about Hottest Evvaa… You can’t kill a political monster with a science argument. The money has to dry up. (Hopefully NSF grants are also tied up in the Government Shutdown ;-)

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.