Thermometers Over Time

These are not as fine grained as I’d like. It would be best as an animation with each decade adding or removing the thermometers; but my skill is not ready for animation yet.

So here’s just a few graphs showing how the thermometers start in Europe and spread over the globe. It’s mostly by about 1/2 Century steps, but not uniform. The first graph is anything before 1800 and the last is just the 15 years from 2000 to 2015.

GHCN v3.3 Thermometers 1800 and Before

GHCN v3.3 thermometers before 1800

GHCN v3.3 thermometers before 1800

GHCN v3.3 Thermometers 1801 to 1900

GHCN v3.3 Between 1800 and 1900

GHCN v3.3 Between 1800 and 1900

GHCN v3.3 Thermometers 1950 to 2000

GHCN v3.3 thermometers 1900 to 1949

GHCN v3.3 thermometers 1900 to 1949

GHCN v3.3 Thermometers 1950 to 2000

GHCN v3.3 thermometers 1950 - 2000

GHCN v3.3 thermometers 1950 – 2000

GHCN v3.3 Thermometers 2001 to 2015

GHCN v3.3 thermometers 2000 - 2015

GHCN v3.3 thermometers 2000 – 2015

In Conclusion

The data prior to 1900 are too thin to be usable for saying anything about the globe.

Between 1900 and 2000 we get decent global coverage but with exceptional bias toward the USA and Europe with industrializing Asia coming on stream. We known nearly nothing about the 3/4 of the planet that is the oceans and that contain the bulk of all surface heat.

After 2000, the thermometer inventory depletes and we are left with too thin a coverage for most of the planet, and in places and ways that are not comparable to the past. We can’t really compare this set of instruments to those different historical sets and really say much of anything about “global change”. We can say a lot about “instrument change”…

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues and tagged , , . Bookmark the permalink.

19 Responses to Thermometers Over Time

  1. Bill in Oz says:

    E M This is important ! And illustrates what is happening with weather stations perfectly.

    Thanks for all your work on this !
    Bill

  2. rms says:

    well done. First step, make an animated GIF with these five png files.

  3. E.M.Smith says:

    @rms:

    Were I going to make an animation, I’d do a set of these with 10 year increments. That would be about 18 graphs, so would take about an hour to make. Not hard really. That would then give an OK animation over a couple of seconds.

    For a real smooth animation, I’d make all the frames 1 year long. The very first thermometer in the record is from 17xx so we’re talking over 300 years. Just most uses cut off the “start of time” at 1800 or 1850 since that’s when you start to get more than nearly none thermometers.

    At 300 frames (one per year) it would take me about 10 hours to make the frames. Then, animated at the typical 30 frames / second, it would run in 10 seconds. A reasonable length.

    The missing bit, unfortunately, is that I don’t have the know-how to make animations or do video editing (yet…). This old dog only learns new tricks so fast…

    I think there may be some animated type plot in the matplotlib, and that would cut the time /effort down a lot IF it exists. But that, too, would require my time to find it / learn it..

    Well before that would come just making a finer grain example of this posting. My intent is to do that with GHCN v4 when I download it.

    As v4 has some larger number of thermometers (something like 10,000 added, but mostly all in the USA and Germany, so the same problems exist) it will be significantly slower per run on the R. Pi (I’d guess about 2.5 to 3 times as slow or about 5 to 8 minutes per frame. Thus my using v3 until I’ve got the programs written and I spend less time doing test runs to debug things…

    The code is pretty simple to extend. Just more instances with narrower “before this year after that year” cases.

    Here’s this basic code (the SQL is really on one line but I’ve pretty printed it for easier reading):

    # -*- coding: utf-8 -*-
    import datetime
    import pandas as pd
    import numpy as np
    import matplotlib.pylab as plt
    import math
    import mysql.connector as MySQLdb
    
    plt.title("Global Thermometer GHCN v3.3 1800 and before")
    plt.xlabel("Longitude")
    plt.ylabel("Latitude")
    plt.xlim(-180,180)
    plt.ylim(-90,90)
    
    try:
        db=MySQLdb.connect(user="chiefio",password="LetMeIn!",database='temps')
        cursor=db.cursor()
    
        print("doing before 1801")
    
        sql="SELECT I.latitude, I.coslong FROM invent3 AS I
             INNER JOIN temps3 as T on I.stationID=T.stationID 
             WHERE year<1801 GROUP BY I.stationID;"
     
        cursor.execute(sql)
        stn=cursor.fetchall()
        data = np.array(list(stn))
        xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
        ys = data.transpose()[1]
    
        plt.scatter(ys,xs,s=0.5,color='blue',alpha=0.5)
    
        plt.show()
        
        plt.title("Global Thermometer GHCN v3.3 1801 - 1900")
        plt.xlabel("Longitude")
        plt.ylabel("Latitude")
        plt.xlim(-180,180)
        plt.ylim(-90,90)
        
        print("doing between 1801 and 1900")
        
        sql="SELECT I.latitude, I.coslong FROM invent3 AS I 
             INNER JOIN temps3 as T on I.stationID=T.stationID 
             WHERE year>1800 AND year<1901 
             GROUP BY I.stationID;"
     
        cursor.execute(sql)
        stn=cursor.fetchall()
        data = np.array(list(stn))
        xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
        ys = data.transpose()[1]
    
        plt.scatter(ys,xs,s=0.5,color='green',alpha=0.5)
    
        plt.show()
        
        plt.title("Global Thermometer GHCN v3.3 1900 - 1949")
        plt.xlabel("Longitude")
        plt.ylabel("Latitude")
        plt.xlim(-180,180)
        plt.ylim(-90,90)
        
        print("doing between 1901 and 1949")
        
        sql="SELECT I.latitude, I.coslong FROM invent3 AS I 
             INNER JOIN temps3 as T on I.stationID=T.stationID 
             WHERE year>1900 AND year<1950 
             GROUP BY I.stationID;"
     
        cursor.execute(sql)
        stn=cursor.fetchall()
        data = np.array(list(stn))
        xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
        ys = data.transpose()[1]
    
        plt.scatter(ys,xs,s=0.5,color='orange')
        
        plt.show()
    
        print("doing between 1950 and 2000")
        
        plt.title("Global Thermometer GHCN v3.3 1950 - 2000")
        plt.xlabel("Longitude")
        plt.ylabel("Latitude")
        plt.xlim(-180,180)
        plt.ylim(-90,90)
        sql="SELECT I.latitude, I.coslong FROM invent3 AS I 
             INNER JOIN temps3 as T on I.stationID=T.stationID 
             WHERE year>1949 AND year<2001 
             GROUP BY I.stationID;"
     
        cursor.execute(sql)
        stn=cursor.fetchall()
        data = np.array(list(stn))
        xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
        ys = data.transpose()[1]
    
        plt.scatter(ys,xs,s=0.5,color='purple')
        plt.show()
        
        print("doing between 2001 and 2015")
        
        plt.title("Global Thermometer GHCN v3.3 2000 - 2015")
        plt.xlabel("Longitude")
        plt.ylabel("Latitude")
        plt.xlim(-180,180)
        plt.ylim(-90,90)
    
        sql="SELECT I.latitude, I.coslong FROM invent3 AS I 
             INNER JOIN temps3 as T on I.stationID=T.stationID 
             WHERE year>2000 GROUP BY I.stationID;"
     
        cursor.execute(sql)
        stn=cursor.fetchall()
        data = np.array(list(stn))
        xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
        ys = data.transpose()[1]
    
        plt.scatter(ys,xs,s=0.5,color='red',alpha=0.5)
    
        plt.show()
    
    except:
        print ("This is the exception branch")
    
    finally:
        print ("All Done")
        if db:
            db.close()
    

    You can see it is just a copied block with a change of the color name and a change of the bounds on the selection. There’s likely some trivial way to make it a loop with: AND year={iterating value}
    and knock them all out in one go; but I’m not that good with Python yet.

  4. E.M.Smith says:

    @Bill in Oz:

    Thanks! It’s nice to know sometimes that I’m not just shouting into the wind!

  5. Pouncer says:

    The 10 March 2019 Sunday Dilbert comic (Scott Adams’) jokes about measuring ocean temps. The boss orders Dilbert to determine ocean temperatures. Dilbert says the ocean is too big. The boss insists Dilbert use 1% of the ocean to estimate the remaining 99%. Dilbert replies it would be cheaper to sample NONE of the ocean and estimate 100 Percent. The boss approves.

    It’s funny but almost as mathematically incorrect as AGW theory in general. Getting a 1% sample out of a population is not, intrinsically, foolish. As you are doing here, it’s vital to ensure the 1% being sampled well represents the variables in the population. It doesn’t do to sample only high places in winter and estimate the globe all year ’round.

    I suggest you run a one-time process to assign and append some sort of unmeaningful, several digit long, randomly permuted attribute number field on each record as a permanent part of the database. Call it the Dilbert Code- and let’s start by assuming it is eight digits long. Then when doing any kind of study, you first run the routine on a sub-set of the DB by extracting and “Selecting” based on two digit out of the middle. So if the Dilbert number for a record is 49733356 and you are selecting only those with “97” in the 2nd and 3rd position, that record is sampled. If you were selecting “97” in positions 3rd and 4th, it wouldn’t. Developing a good Dilbert Number depends on your number of records, of course.

    The interesting thing, then, would be to see if you get similar trends or averages or images based on identical runs with two distinct Dilbert Number …

  6. E.M.Smith says:

    @Pouncer:

    Interesting (and amusing) idea…

    My “plan” (of a sort) is to just take the longer records (compare them to v2 and v1 and assure they are not too “doctored” now) and use ONLY them. Take ALL the data for a given record to make the “baseline” (NOT just a 30 year period when it was colder) and then make the trend for each instrument. Finally, compare all the trends.

    It will be a while though…

    Then after that it might be interesting to take both random samples and assess trend variance, and also to take some “high cold stations” and some “at the beach” stations and compare their trends…

  7. E.M.Smith says:

    I’m starting the process of making the “anomaly” database. This is made by taking the temperature for a thermometer and subtracting the average temperature for that thermometer. I’m choosing to NOT use a “baseline” of a given 30 year period, but to instead just use “all data” for that thermometer for that month.

    Why?

    Pretty simple, really:

    I NEVER EVER want to compare ONE thermometer to some OTHER thermometer nor to a FICTIONAL THERMOMETER that doesn’t exist. Using a baseline set that other data is compared with does exactly that.

    I’m also seeing each (disjoint…) record as a small part of a larger cycle of change. To that end, measuring things against any 30 year ‘baseline’ just bakes in the cake the notion that that “baseline” is “normal”. It is NOT normal! It is a semi-random 1/2 of a known 60 year cycle.

    By taking each set of data and normalizing it against itself, it sets the “trend” for that period of time accurately inside the STATE OF THE CYCLE at that time. That way, all the segments can be spliced together without bleeding in the bias of that “baseline” segment.

    To do that, I need the average of each thermometer over all the data it has for a given month to then subtract from each reading for that thermometer for each month. That way every MONTH is compared only with itself too. It really ought to be the case that each month ought to be more or less consistent year over year. Either static, cooling, warming, or cycling.

    Once that is done, all the months of every year of every thermometer are anomalies based entirely and only on their own internal state and without bias introduced by other thermometers in other places or times. THAT ought to be a clean anomaly series that can be spliced together without too much damage. Also, each stand alone record will tell a story of local change.

    I’d done this before with the dT/dt stuff (see the category in the side bar) using FORTRAN. I’m redoing it using SQL / Python preparatory to applying it to the GHCN v4 data (and maybe the USHCN). Why? The way I did it in FORTRAN was rather inelegant. I just stuffed everything into a giant flat file… Using LOTS of disk instead of processor or I/O… This will be a bit more elegant… I’d also used First Differences and some folks didn’t like that.

    As a first step, I’m getting comfortable with the AVG average function of SQL. I also need to make an anomaly table and then work out the processing to turn temperatures into anomalies using the AVG and stuff that into the anomaly table. So one step at a time…

    Here’s the first step. What is the average for each region of all thermometers?

    MariaDB [temps]> source SQL/bin/AvgRegion.sql
    +------+------------+--------+
    | type | AVG(deg_C) | region |
    +------+------------+--------+
    | TAVG |  22.878105 | 1      |
    | TAVG |  11.945415 | 2      |
    | TAVG |  19.249609 | 3      |
    | TAVG |  10.652775 | 4      |
    | TAVG |  19.914713 | 5      |
    | TAVG |   9.740426 | 6      |
    | TAVG | -13.474280 | 7      |
    | TAVG |  10.811983 | 8      |
    +------+------------+--------+
    8 rows in set (48.99 sec)
    
    SELECT DISTINCT type, AVG(deg_C), region FROM temps3 
    WHERE deg_c>-99 GROUP BY region;
    

    Remember that the missing data flag is -999 and so we need to screen out impossible temperatures. In some future version of this database I’ll likely just replace that with NULL as it is a source of potential error. For example, DON’T skip those flags and the results are bogus and biased “cold”:

    BOGUS RESULT LEAVING IN MISSING DATA FLAG -999
    MariaDB [temps]> SELECT DISTINCT type, AVG(deg_C), region FROM temps3 GROUP BY region;
    +------+------------+--------+
    | type | AVG(deg_C) | region |
    +------+------------+--------+
    | TAVG |   9.538013 | 1      |
    | TAVG |   8.260590 | 2      |
    | TAVG |  10.542986 | 3      |
    | TAVG |   5.805173 | 4      |
    | TAVG |  13.027409 | 5      |
    | TAVG |   7.441090 | 6      |
    | TAVG | -21.191361 | 7      |
    | TAVG |  -0.316182 | 8      |
    +------+------------+--------+
    8 rows in set (31.57 sec)
    

    But it IS 17 seconds faster ;-)

    Just by inspection of those two tables, though, you can see there’s a whole lot of “missing data”… As I converted this from 1/10 s C to whole degrees C with a decimal point, that is a -99.9 value. That’s why I check for temperatures greater than -90 C or -99 C. The lowest ever recorded temperature was higher than that…

    Here’s the missing data counts total and by continent:

    MariaDB [temps]> SELECT COUNT(deg_C) FROM temps3 WHERE deg_C<-90;
    +--------------+
    | COUNT(deg_C) |
    +--------------+
    |       256301 |
    +--------------+
    1 row in set (32.25 sec)
    
    
    MariaDB [temps]> SELECT COUNT(deg_C), region FROM temps3 WHERE deg_C<-90 GROUP BY region;
    +--------------+--------+
    | COUNT(deg_C) | region |
    +--------------+--------+
    |        42226 | 1      |
    |        29459 | 2      |
    |        16684 | 3      |
    |       121238 | 4      |
    |        29031 | 5      |
    |        15663 | 6      |
    |         1536 | 7      |
    |          464 | 8      |
    +--------------+--------+
    8 rows in set (36.73 sec)
    

    I suppose I ought not be surprised that North America (4) has the most, since we have by far the most thermometers… but I was. I thought we did better than that.

    Well, with that, I think I’ve got “Average” under control and can move on to the next step…

  8. E.M.Smith says:

    Oh, and for comparison I’m going to paste in the count of nominally valid records here too. Though first: Even a mistake can be enlightening. I left the minus sign off on one query and it returned way too few records for what I was expecting (total existing data); HOWEVER, it did inform that there were some very bogus high values in the database somewhere. I remember running into these guys years ago in V2 IIRC.

    MariaDB [temps]> SELECT COUNT(deg_C) FROM temps3 
    WHERE deg_C>90 ;
    +--------------+
    | COUNT(deg_C) |
    +--------------+
    |            6 |
    +--------------+
    1 row in set (31.26 sec)
    

    So, including them, here’s the count of records without a “Missing Data Flag”:

    MariaDB [temps]> SELECT COUNT(deg_C) FROM temps3 
    WHERE deg_C>-90 ;
    +--------------+
    | COUNT(deg_C) |
    +--------------+
    |      5297899 |
    +--------------+
    1 row in set (31.79 sec)
    

    Then the By Region data counts:

    MariaDB [temps]> SELECT COUNT(deg_C),region FROM temps3 
    WHERE deg_C>-90 GROUP BY region;
    +--------------+--------+
    | COUNT(deg_C) | region |
    +--------------+--------+
    |       346694 | 1      |
    |       865429 | 2      |
    |       211808 | 3      |
    |      2645926 | 4      |
    |       476385 | 5      |
    |       731817 | 6      |
    |        15684 | 7      |
    |         4156 | 8      |
    +--------------+--------+
    8 rows in set (43.56 sec)
    

    It looks to me like “Ships” (8) is pretty worthless and Antarctica (7) pretty much also. But we’ll take a closer look at those another day…

    From this it is very clear that the “Global Temperature” is largely the USA Northern Hemisphere temperature with some European Seasoning and an Asian “topper”. The southern hemisphere doesn’t really make it past the doorman into the party…

    MariaDB [temps]> SELECT region, region_name FROM continent;
    +--------+---------------------------+
    | region | region_name               |
    +--------+---------------------------+
    | 1      | Africa                    |
    | 2      | Asia                      |
    | 3      | South America             |
    | 4      | North America             |
    | 5      | Australia Pacific Islands |
    | 6      | Europe                    |
    | 7      | Antarctica                |
    | 8      | Ship Stations Ocean       |
    +--------+---------------------------+
    8 rows in set (0.48 sec)
    
  9. E.M.Smith says:

    OK, I’ve go the basics for making an anomaly over all of a given month data done for a specific station and month. Now I just need to add the stuff that does it for all of them and then puts it back into a table… Yeah “just”…

    This example first calculates an average just so that I can do a manual QA on the next step that also calculates an average, but then subtracts it from the individual data items to make the ‘anomaly’:

    MariaDB [temps]> source SQL/bin/Anom.sql
    +----------+
    | Average  |
    +----------+
    | 6.625000 |
    +----------+
    1 row in set (19.62 sec)
    
    +------+--------------+-------+-----------+-------------+
    | type | COUNT(deg_C) | deg_C | Anomaly   | stationID   |
    +------+--------------+-------+-----------+-------------+
    | TAVG |            1 |  7.50 |  0.875000 | 80099909001 |
    | TAVG |            1 |  6.70 |  0.075000 | 80099909001 |
    | TAVG |            1 |  5.80 | -0.825000 | 80099909001 |
    | TAVG |            1 |  5.90 | -0.725000 | 80099909001 |
    | TAVG |            1 |  7.10 |  0.475000 | 80099909001 |
    | TAVG |            1 |  8.20 |  1.575000 | 80099909001 |
    | TAVG |            1 |  8.00 |  1.375000 | 80099909001 |
    | TAVG |            1 |  5.50 | -1.125000 | 80099909001 |
    | TAVG |            1 |  3.80 | -2.825000 | 80099909001 |
    | TAVG |            1 |  7.70 |  1.075000 | 80099909001 |
    +------+--------------+-------+-----------+-------------+
    10 rows in set (0.00 sec)
    

    Looks like it worked!

    Heres the SQL for it:

    chiefio@PiM3Devuan2:~/SQL/bin$ cat Anom.sql 
    SELECT AVG(deg_C) AS "Average" FROM temps3
    WHERE deg_C>-90 AND month=" JAN" AND stationID=80099909001;
    
    SELECT DISTINCT type, COUNT(deg_C), deg_C, deg_C-
    (SELECT AVG(deg_C) as STUFF FROM temps3 
    WHERE deg_C>-90 AND month=" JAN" AND stationID=80099909001)
    AS 'Anomaly', stationID 
    FROM temps3 
    WHERE deg_C>-90 AND month=" JAN" AND stationID=80099909001
    GROUP BY stationID, year;
    

    I’ll do something like this in the final version, but, my intent is to make a table of “statistics” about the temperature data. So for each Station ID I’ll find the average and count for each month of data and stuff it into a table. Then that second query would be a join to that table and using the Average found there instead of calculating it.

    So yeah, a lot of work yet to go, but the basic “how to do it in SQL” done.

    It also gives me a nice tool where I can pick any station and find the average for a month plus see a table of the anomalies and “eyeball it” as to any trend.

    For example (AND as a BIG WT? about data completeness and quality) here’s Cape Kenedy

    | CAPE KENNEDY,                  | 42574794000 |
    

    Only 15 years. Really? What the hell happened to all the data from about 1950 to now?

    MariaDB [temps]> source SQL/bin/Anom.sql
    +--------------+-----------+
    | COUNT(deg_C) | Average   |
    +--------------+-----------+
    |           15 | 15.466667 |
    +--------------+-----------+
    1 row in set (0.01 sec)
    
    
    +------+--------------+------+-------+-----------+-------------+
    | type | COUNT(deg_C) | year | deg_C | Anomaly   | stationID   |
    +------+--------------+------+-------+-----------+-------------+
    | TAVG |            1 | 1952 | 17.40 |  1.933333 | 42574794000 |
    | TAVG |            1 | 1957 | 18.00 |  2.533333 | 42574794000 |
    | TAVG |            1 | 1958 | 12.70 | -2.766667 | 42574794000 |
    | TAVG |            1 | 1959 | 15.10 | -0.366667 | 42574794000 |
    | TAVG |            1 | 1960 | 15.40 | -0.066667 | 42574794000 |
    | TAVG |            1 | 1961 | 14.70 | -0.766667 | 42574794000 |
    | TAVG |            1 | 1962 | 16.20 |  0.733333 | 42574794000 |
    | TAVG |            1 | 1963 | 15.90 |  0.433333 | 42574794000 |
    | TAVG |            1 | 1964 | 15.00 | -0.466667 | 42574794000 |
    | TAVG |            1 | 1965 | 15.10 | -0.366667 | 42574794000 |
    | TAVG |            1 | 1966 | 14.60 | -0.866667 | 42574794000 |
    | TAVG |            1 | 1967 | 17.10 |  1.633333 | 42574794000 |
    | TAVG |            1 | 1968 | 15.40 | -0.066667 | 42574794000 |
    | TAVG |            1 | 1969 | 16.20 |  0.733333 | 42574794000 |
    | TAVG |            1 | 1970 | 13.20 | -2.266667 | 42574794000 |
    +------+--------------+------+-------+-----------+-------------+
    15 rows in set (1 min 8.57 sec)
    

    Well first off, it isn’t warming over the period. Second, why only those years recorded?

    Anyway, you get the idea. You can “eyeball” a series and see if the anomalies are all positive recently and negative in the past, or not.

    BTW, my “guess” would be that this is one they put in just for the “baseline” then didn’t bother recording after that (i.e. took that instrument out of the inventory going forward).

    You will note I added year to that last one. Here’s what that code looks like:

    SELECT COUNT(deg_C), AVG(deg_C) AS "Average" FROM temps3
    WHERE deg_C>-90 AND month=" JAN" AND stationID=42574794000;
    
    SELECT DISTINCT type, COUNT(deg_C), year, deg_C, deg_C-
    (SELECT AVG(deg_C) as STUFF FROM temps3 
    WHERE deg_C>-90 AND month=" JAN" AND stationID=42574794000)
    AS 'Anomaly', stationID 
    FROM temps3 
    WHERE deg_C>-90 AND month=" JAN" AND stationID=42574794000
    GROUP BY stationID, year;
    
    
  10. Dr Alexander MCCLINTOCK says:

    The number of stations in Australia post 2000 is very much lower than between
    1950-2000.

  11. E.M.Smith says:

    @Dr Alexander MCCLINTOCK:

    Yes, they have a load of stations in the dataset between about 1950 and 1990 that are there ONLY for the baseline intervals of 1950-1980 (GISS) or 1960-1990 (Hadley). The True Believers get there nose out of joint when I say the stations are deleted after that point (and get all axle wound that it wasn’t deleted it just wasn’t put in… as though that matters). I look at it and it just smells like someone fudging a spot…

    The reality is that there are too few long lived records globally to say anything of reality. ALL the records are either too short, to full of dropouts, or too sparse over the geography.

    Because of that they resort to all sorts of shenanigans to “make up values” where there is no data. They use variously 8000 or 16,000 “grid boxes” to describe the earth, but there are only about 1200 current thermometers and even at the peak count only 7280, so most of those grid-box “anomalies” are created from a fiction in the present (and many from a fiction in the baseline). To reduce the impact of that to something that isn’t entirely “fiction to fiction comparison” they added a bunch of stations in the baseline that are dropped (in terms of data collection) after that time. Now it is mostly baseline: fiction comparisons…

    Note To All (Per SQL):

    I’m making a statistics table and got a load of “warnings”. It just tells you how many nothing more. Seems there’s a command you must issue right after the warning count is given to see what they are. Seems my “mean” calculation exceeded the DECIMAL(5.2) size of the temperatre. Not sure how a mean can be larger than the values… but to get past it I just made the field 7.2 so I can inspect it.

    In any case “SHOW WARNINGS” is how you see what it complained about:

    MariaDB [temps]> source bin/loadmstats
    Query OK, 87360 rows affected, 65535 warnings (1 min 40.15 sec)
    Records: 87360  Duplicates: 0  Warnings: 79358
    
    MariaDB [temps]> SHOW WARNINGS
        -> ;
    +-------+------+--------------------------------------------+
    | Level | Code | Message                                    |
    +-------+------+--------------------------------------------+
    | Note  | 1265 | Data truncated for column 'mean' at row 1  |
    | Note  | 1265 | Data truncated for column 'mean' at row 2  |
    | Note  | 1265 | Data truncated for column 'mean' at row 3  |
    | Note  | 1265 | Data truncated for column 'mean' at row 4  |
    | Note  | 1265 | Data truncated for column 'mean' at row 5  |
    | Note  | 1265 | Data truncated for column 'mean' at row 6  |
    | Note  | 1265 | Data truncated for column 'mean' at row 7  |
    

    UPDATE:

    Still got the same warnings… I’ve made it a FLOAT as I think what’s happening is the math makes more than 2 decimal points so the DECIMAL type is saying it tossed the low order bits…

    With FLOAT type the warnings go away:

    MariaDB [temps]> source bin/loadmstats
    Query OK, 87360 rows affected (1 min 42.91 sec)
    Records: 87360  Duplicates: 0  Warnings: 0
    

    That 87360 is the same as the 7280 instruments x 12 months, so I think I’m good to go now.

  12. E.M.Smith says:

    Looks like it worked. Here’s a report of the lowest count stations with ID, mean, biggest, smallest, and count:

    MariaDB [temps]> SELECT stationID, month, mean,big, small,num FROM mstats3
        -> WHERE num<10 ORDER BY num DESC;
    [...]
    | 15464074000 | SEPT  |     21.275 |  21.90 |  20.80 |   8 |
    | 42574598002 |  OCT  |    15.5125 |  17.80 |  13.90 |   8 |
    | 42574598002 |  MAY  |     19.625 |  21.50 |  18.00 |   8 |
    | 40678353000 | SEPT  |    27.9375 |  28.40 |  27.50 |   8 |
    | 50194845001 |  APR  |    12.5625 |  14.30 |  10.60 |   8 |
    | 20940604001 |  DEC  |     8.6625 |  10.80 |   6.70 |   8 |
    | 30984630002 | JULY  |    10.1625 |  11.20 |   9.30 |   8 |
    | 40371078003 |  MAR  |   -14.5875 |  -9.20 | -20.50 |   8 |
    | 42574570001 |  OCT  |     14.025 |  16.00 |  11.70 |   8 |
    | 15464207000 | JULY  |       20.5 |  22.10 |  19.70 |   8 |
    | 40371817003 |  FEB  |   -11.0875 |  -7.60 | -15.60 |   8 |
    | 15464224000 |  DEC  |      24.35 |  24.90 |  23.60 |   8 |
    | 31281225001 |  MAR  |     26.425 |  27.00 |  25.30 |   8 |
    | 31281225001 |  AUG  |      27.65 |  28.10 |  27.20 |   8 |
    | 15464315000 | SEPT  |    23.2875 |  24.00 |  22.30 |   8 |
    | 15464315000 | JULY  |     21.425 |  22.50 |  20.50 |   8 |
    | 15464315000 |  OCT  |    22.3375 |  22.90 |  22.00 |   8 |
    | 15464315000 |  NOV  |      21.25 |  21.60 |  20.80 |   8 |
    | 50194404000 |  MAY  |    15.6375 |  17.60 |  13.80 |   8 |
    | 15464315000 |  DEC  |     21.125 |  21.60 |  20.60 |   8 |
    | 31281225001 | JUNE  |      26.55 |  27.30 |  25.90 |   8 |
    | 15464315000 |  APR  |      21.95 |  23.10 |  20.90 |   8 |
    | 41678795000 |  NOV  |    26.2875 |  26.90 |  25.40 |   8 |
    | 41678793000 |  JAN  |    26.5375 |  27.20 |  25.80 |   8 |
    | 15464018000 | SEPT  |    24.2375 |  24.80 |  23.70 |   8 |
    | 42574570001 |  MAY  |     17.775 |  21.10 |  14.80 |   8 |
    | 50194896000 |  MAY  |     11.775 |  13.40 |  10.40 |   8 |
    | 40371957001 |  JAN  |      -23.2 | -13.90 | -29.70 |   8 |
    | 15464014000 |  APR  |    24.4875 |  25.40 |  23.80 |   8 |
    | 40371964005 |  OCT  |    -0.6125 |   1.40 |  -3.90 |   8 |
    | 40371447003 |  APR  |      2.125 |   7.50 |  -2.20 |   8 |
    | 40371957001 |  APR  |     -9.375 |  -3.50 | -14.30 |   8 |
    | 40371953002 | SEPT  |     7.1875 |   9.30 |   5.30 |   8 |
    | 40371953002 | JULY  |    14.5875 |  17.30 |  13.20 |   8 |
    | 40371447003 |  JAN  |   -18.1625 | -12.50 | -23.60 |   8 |
    | 40371953002 |  DEC  |        -23 | -16.00 | -31.20 |   8 |
    | 40371953002 |  AUG  |    12.5375 |  14.20 |  10.50 |   8 |
    | 40371951004 | SEPT  |    12.1125 |  13.40 |  10.80 |   8 |
    | 15464014000 |  MAR  |    24.6625 |  25.40 |  23.40 |   8 |
    | 15464014000 |  NOV  |       23.6 |  24.00 |  22.90 |   8 |
    | 40371125001 |  DEC  |   -16.4375 | -11.90 | -26.30 |   8 |
    | 15464018000 |  MAY  |    25.0125 |  25.90 |  24.50 |   8 |
    | 15464018000 |  MAR  |    25.4875 |  26.40 |  24.60 |   8 |
    | 50194880002 |  DEC  |    15.6875 |  16.70 |  14.50 |   8 |
    | 50194880002 |  JAN  |    17.3125 |  19.90 |  15.10 |   8 |
    | 50194880002 |  MAR  |    16.3875 |  18.00 |  14.00 |   8 |
    | 50194880002 |  MAY  |     10.475 |  11.80 |   9.20 |   8 |
    | 50194880002 |  NOV  |     14.675 |  16.00 |  13.70 |   8 |
    | 50194880002 | JULY  |     6.9625 |   8.10 |   5.30 |   8 |
    | 50194880002 | JUNE  |       7.95 |  10.40 |   5.60 |   8 |
    | 30984628001 | JULY  |     14.975 |  16.90 |  13.90 |   8 |
    | 21544218001 |  NOV  |       -9.7 |  -6.40 | -14.60 |   8 |
    | 40371957001 | JUNE  |    10.9875 |  13.20 |   7.20 |   8 |
    | 30984628001 |  MAY  |      18.05 |  19.60 |  16.30 |   8 |
    +-------------+-------+------------+--------+--------+-----+
    577 rows in set (0.25 sec)
    

    So now I get to finish adding any other stats I like and then I can “move on” to creating the anomalies based on subtracting that monthly mean from the individual data items in the temperature record (by Station ID by Month)

  13. E.M.Smith says:

    Well, still a bit more to go, but I think this kind of thing has some utility. As MAX, MIN, COUNT and RANGE are all keywords in SQL, I’ve used big, small, num, and trang instead.:

    chiefio@PiM3Devuan2:~/SQL/bin$ cat monstats.sql 
    SELECT stationID,month,mean,big,small,num,trang FROM mstats3 
    WHERE num source bin/monstats.sql
    +-------------+-------+------------+--------+--------+-----+-------+
    | stationID   | month | mean       | big    | small  | num | trang |
    +-------------+-------+------------+--------+--------+-----+-------+
    | 10764880001 |  APR  |    24.5333 |  25.40 |  23.70 |   9 |   1.7 |
    | 10764880001 |  MAR  |       25.1 |  26.10 |  23.80 |   9 |   2.3 |
    | 10764880001 |  MAY  |    23.6889 |  24.00 |  23.30 |   9 |   0.7 |
    | 10764880001 |  NOV  |       23.6 |  24.00 |  22.80 |   9 |   1.2 |
    | 10764880001 | JULY  |    22.4222 |  23.00 |  21.90 |   9 |   1.1 |
    | 10764880001 | JUNE  |    22.9778 |  23.50 |  22.50 |   9 |     1 |
    | 10764880001 | SEPT  |    22.7556 |  23.10 |  22.60 |   9 |   0.5 |
    | 10764931001 |  AUG  |    23.4222 |  24.30 |  22.60 |   9 |   1.7 |
    | 10764931001 |  DEC  |    23.7444 |  24.60 |  22.20 |   9 |   2.4 |
    | 10764931001 |  NOV  |       24.1 |  24.70 |  23.50 |   9 |   1.2 |
    | 10764931001 |  OCT  |    24.1333 |  24.60 |  23.50 |   9 |   1.1 |
    | 10764931001 | SEPT  |    23.8111 |  24.30 |  23.10 |   9 |   1.2 |
    | 10764931002 |  FEB  |    24.1444 |  24.70 |  23.70 |   9 |     1 |
    | 10764931002 | JULY  |    22.4778 |  23.30 |  21.20 |   9 |   2.1 |
    | 10764950002 |  JAN  |    23.8889 |  24.90 |  23.20 |   9 |   1.7 |
    | 11064702000 |  APR  |    31.0556 |  32.70 |  30.00 |   9 |   2.7 |
    | 11064702000 |  AUG  |    26.8667 |  28.70 |  24.80 |   9 |   3.9 |
    | 11064702000 |  FEB  |    24.3667 |  26.40 |  22.40 |   9 |     4 |
    | 11064702000 |  JAN  |    21.1444 |  24.30 |  18.60 |   9 |   5.7 |
    | 11064702000 |  MAY  |    31.5222 |  32.80 |  30.60 |   9 |   2.2 |
    | 11064702000 | JULY  |    28.4556 |  30.30 |  26.70 |   9 |   3.6 |
    | 11064702000 | JUNE  |    30.3444 |  32.00 |  27.80 |   9 |   4.2 |
    | 11064702000 | SEPT  |    28.1778 |  29.90 |  27.50 |   9 |   2.4 |
    | 11562338000 |  DEC  |    15.6111 |  17.90 |  14.50 |   9 |   3.4 |
    | 11562338000 |  FEB  |    14.3778 |  15.70 |  13.00 |   9 |   2.7 |
    | 11562338000 |  MAR  |    15.7778 |  17.20 |  14.10 |   9 |   3.1 |
    |
    [...]
    | 42572773007 |  FEB  |       -6.3 |  -2.00 |  -9.20 |   9 |   7.2 |
    | 42572773007 |  MAR  |   -5.07778 |  -3.00 |  -8.50 |   9 |   5.5 |
    | 42572773007 |  MAY  |    5.77778 |   8.60 |   2.20 |   9 |   6.4 |
    | 42572773007 |  NOV  |   -2.33333 |   2.60 |  -7.40 |   9 |    10 |
    | 42572773007 |  OCT  |    3.22222 |   9.00 |   0.50 |   9 |   8.5 |
    | 42572773007 | JUNE  |    8.73333 |  10.30 |   7.00 |   9 |   3.3 |
    | 42572773007 | SEPT  |       10.3 |  12.30 |   8.20 |   9 |   4.1 |
    | 42574570001 |  APR  |    12.1889 |  14.70 |   8.50 |   9 |   6.2 |
    | 42574570001 |  AUG  |    24.2556 |  25.90 |  23.00 |   9 |   2.9 |
    | 42574570001 |  DEC  |   0.777778 |   5.00 |  -3.80 |   9 |   8.8 |
    | 42574570001 |  FEB  |    1.16667 |   4.90 |  -3.80 |   9 |   8.7 |
    | 42574570001 |  JAN  |   -1.66667 |   1.00 |  -3.60 |   9 |   4.6 |
    | 42574570001 |  MAR  |    4.22222 |   7.50 |  -1.70 |   9 |   9.2 |
    | 42574570001 |  NOV  |    6.43333 |   7.60 |   4.50 |   9 |   3.1 |
    | 42574570001 | JULY  |    24.6111 |  26.60 |  23.40 |   9 |   3.2 |
    | 42574570001 | SEPT  |    20.7222 |  22.10 |  18.50 |   9 |   3.6 |
    | 42574598002 |  AUG  |    25.1333 |  27.10 |  24.10 |   9 |     3 |
    | 42574598002 | JUNE  |    23.5667 |  25.10 |  22.40 |   9 |   2.7 |
    | 42574598002 | SEPT  |    21.4333 |  24.30 |  19.00 |   9 |   5.3 |
    [...]
    | 50194404000 |  MAY  |    15.6375 |  17.60 |  13.80 |   8 |   3.8 |
    | 50194829000 |  NOV  |      14.15 |  15.20 |  13.00 |   8 |   2.2 |
    | 50194845001 |  APR  |    12.5625 |  14.30 |  10.60 |   8 |   3.7 |
    | 50194880002 |  DEC  |    15.6875 |  16.70 |  14.50 |   8 |   2.2 |
    | 50194880002 |  JAN  |    17.3125 |  19.90 |  15.10 |   8 |   4.8 |
    | 50194880002 |  MAR  |    16.3875 |  18.00 |  14.00 |   8 |     4 |
    | 50194880002 |  MAY  |     10.475 |  11.80 |   9.20 |   8 |   2.6 |
    | 50194880002 |  NOV  |     14.675 |  16.00 |  13.70 |   8 |   2.3 |
    | 50194880002 | JULY  |     6.9625 |   8.10 |   5.30 |   8 |   2.8 |
    | 50194880002 | JUNE  |       7.95 |  10.40 |   5.60 |   8 |   4.8 |
    | 50194896000 |  MAY  |     11.775 |  13.40 |  10.40 |   8 |     3 |
    | 50194896000 |  NOV  |    17.8125 |  18.60 |  16.60 |   8 |     2 |
    | 62906597001 | JULY  |     17.175 |  19.20 |  15.40 |   8 |   3.8 |
    | 63213457000 |  FEB  |     8.2125 |   9.60 |   5.80 |   8 |   3.8 |
    | 63213457000 |  JAN  |     7.0875 |   9.50 |   6.40 |   8 |   3.1 |
    | 64308227001 |  JAN  |      5.575 |   8.10 |   3.90 |   8 |   4.2 |
    | 64917022002 |  DEC  |     -0.525 |   1.00 |  -3.30 |   8 |   4.3 |
    | 65103761001 |  OCT  |       10.8 |  12.30 |   9.70 |   8 |   2.6 |
    | 65103761001 | JUNE  |    14.4875 |  16.60 |  13.20 |   8 |   3.4 |
    +-------------+-------+------------+--------+--------+-----+-------+
    577 rows in set (0.26 sec)
    

    Scroll the report to the right to see the range.

    Still some more to do (just what all to put in the statistics table, things to do with it, etc.) but it is already interesting. These statistics are for a given station and month over all years of data and intended for a particular purpose (creating temperature anomalies for a given station-month). I’m intending to make a couple of other statistics tables with things like stats for a station overall (regardless of month) and such.

    Don’t know what all I’ll do with them at this point, but I can see things like graphing how average “range” in a given yearchanges over time (a measure of volatility of the aggregate stations in any given year).

  14. E.M.Smith says:

    Well, in theory I’ve got the anomalies done. Still needs checking and QA, but:

    chiefio@PiM3Devuan2:~/SQL/bin$ cat lanom3.sql 
    INSERT INTO  anom3 (stnID,region,country,wmo,near_wmo,year,month,deg_C)
    SELECT T.stnID,T.region,T.country,T.wmo,T.near_wmo,T.year,T.month,T.deg_C-ST.mean
    FROM temps3 AS T
    INNER JOIN mstats3 AS ST 
    ON
            ST.stnID=T.stnID
    	AND ST.month=T.month 
    WHERE T.deg_C > -90 
    ;
    

    The careful observer will also notice I changed stationID to stnID ’cause I got tired of so much typing (and reloaded all the tables and such)

    Here’s the actual run of the anomalizer. It uses the “mean” from the stats table:

    MariaDB [temps]> source bin/lanom3.sql
    Query OK, 5297899 rows affected, 65535 warnings (11 min 42.24 sec)
    Records: 5297899  Duplicates: 0  Warnings: 5292067
    

    So far 11 minutes is by far the longest anything has taken. Twice as long as the basic temperature load step. I suspect it is due to 3 tables on the same disk causing disk head seeks. Stuff from temps3 and a mean/statistic from mstats3 then put the result in anom3.

    The warnings again look to be because I’m stuffing a FLOAT into a DECIMAL(5.2). I may change that, or not… we’ll see. An extract of the warnings:

    MariaDB [temps]> show warnings
        -> ;
    +-------+------+----------------------------------------------+
    | Level | Code | Message                                      |
    +-------+------+----------------------------------------------+
    | Note  | 1265 | Data truncated for column 'deg_c' at row 1   |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 13  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 25  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 49  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 73  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 85  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 97  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 109 |
    |[...]
    | Note  | 1265 | Data truncated for column 'deg_c' at row 98  |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 110 |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 122 |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 134 |
    | Note  | 1265 | Data truncated for column 'deg_c' at row 146 |
    +-------+------+----------------------------------------------+
    64 rows in set (0.06 sec)
    MariaDB [temps]> 
    

    Other than some QA after dinner, I think this pretty much wraps up the work for today. IF it passes a QA check as having worked right, I’ll try some reports / graphs of anomalies later.

  15. H.R. says:

    What the heck is QA?!? We don’ need no steenkin’ QA.

    What the heck kind of climate scientist are you, anyhow. E.M.?!? Checks and double checks? Unprecedented!!! Madness!!!

    How the heck do you ever expect to collect a check from Big Oil if you can’t show the World’s panties bursting into flames?

  16. Pingback: GHCN v3.3 Anomaly By Continent | Musings from the Chiefio

  17. E.M.Smith says:

    @H.R.:

    What’s this about hot panties and getting a check?

    Please send instructions forthwith!

    ;-)

  18. Pingback: GHCN v4 Thermometer Inventory Over Time | Musings from the Chiefio

  19. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.