GHCN v4 1000 M Station Changes

The curators of GHCN have added a lot of stations, but also dropped a lot. It is nice that they have added stations, but it can make it harder to see where change has happened. I’ve tried making one graph out of this and all you see is whatever is plotted on top. So here are two graphs and you can compare them and “visually integrate”.

GHVN v4 Stations over 1000m Altitude in 2018

GHVN v4 Stations over 1000m Altitude in 2018

Then these are the station that were in the baseline decades (used by both GISS and Hadley – 1950-1990) but that are gone now:

GHCN v4 1000m Altitude In 1950-1990, Gone Now

GHCN v4 1000m Altitude In 1950-1990, Gone Now

Here’s the same data stacked with “blue on top’ with the dots half the diameter and 1/2 transparent. Still not much purple showing…

GHCN v4 1000m Altitude Baseline blue over 2018

GHCN v4 1000m Altitude Baseline blue over 2018

That’s a whole lot of instrument change at altitude. These are the more volatile stations and the “missing” data in the present will be made up from other stations up to 1200 km away via the Reference Station Method in the traditional Grid Box Anomaly fabrication process. I have little confidence that can work to 1/2 C of perfection with historical data recorded in whole degrees C or F.

Here’s the same composite of dropped baseline blue stations over red 2018 “Now” stations, but with the cutoff altitude raised to 2000m. It generally looks to me like a drift down the Rockies, but not really enough visibility in the graphs to prove it. Details in the data ought to be examined.

GHCN v4 Baseline Gone over 2018 (red) 2000m+ Altitude

GHCN v4 Baseline Gone over 2018 (red) 2000m+ Altitude

Tech Talk

Here’s the code that made the first graphs. Hopefully I’ve not got any stupid mistakes in it ;-) In the working code the SQL text is all on one line. I’ve added line breaks so you don’t need to scroll forever…

# -*- coding: utf-8 -*-
import datetime
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import math
import mysql.connector as mariadb


print ("Just before the Try")

try:
    
    mariadb_connection=mariadb.connect(user='chiefio',password='LetMeIn!',database='temps')
    print ("did the db assignment")
    cursor=mariadb_connection.cursor()
    print ("did the cursor thing")
    print ("stuff the SQL string")
    
    sql="SELECT I.latitude, I.coslong FROM invent4 AS I 
         INNER JOIN temps4 as T on I.stnID=T.stnID 
         WHERE I.stn_elev>1000 AND I.stn_elev<9000 
         AND year=2018 GROUP BY I.stnID;"
    print ("execute it")
    cursor.execute(sql)
    print ("back from the execute")

    
    plt.title("v4 Global Thermometer Above 1000 m  2018")
    plt.xlabel("Longitude")
    plt.ylabel("Latitude")
    plt.xlim(-180,180)
    plt.ylim(-90,90)
    stn=cursor.fetchall()
    data = np.array(list(stn))
    xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
    ys = data.transpose()[1]
    print ("do the plot")
    plt.scatter(ys,xs,s=1,color='red',alpha=1)
    plt.show()

    
    plt.title("v4 Global Thermometer Above 1000 m 1950-1990 Gone Now")
    plt.xlabel("Longitude")
    plt.ylabel("Latitude")
    plt.xlim(-180,180)
    plt.ylim(-90,90)
    sql="SELECT I.latitude, I.coslong FROM invent4 AS I 
         INNER JOIN temps4 as T on I.stnID=T.stnID 
         WHERE I.stn_elev>1000 AND I.stn_elev<9000 
         AND year>1949 AND year<1991 AND I.stnID NOT IN 
         (SELECT I.stnID FROM invent4 AS I INNER JOIN temps4 as T 
         ON I.stnID=T.stnID WHERE year=2018)GROUP BY I.stnID;"
 
    cursor.execute(sql)
    stn=cursor.fetchall()
    data = np.array(list(stn))
    xs = data.transpose()[0]   # or xs = data.T[0] or  xs = data[:,0]
    ys = data.transpose()[1]

    plt.scatter(ys,xs,s=1,color='blue',alpha=0.5)



    plt.show()

except:
    print ("This is the exception branch")

finally:
    print ("All Done")

The other graph is essentially those two laid on top of each other by commenting out the first plt.show and with changed headings. Also the plt.scatter has the dot size made smaller with s=0.5 but I think that’s not enough to need posting all that text too. The 2000 m graphs just have the 1000 test changed to 2000.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues. Bookmark the permalink.

11 Responses to GHCN v4 1000 M Station Changes

  1. Bill in Oz says:

    E M Very perplexing. Huge number of drop outs in Africa ! Why ?

    And know most of Australia is flatish with not many high mountains but there is the Great Dividing Range running down the East coast of Oz with plenty of places above 1000 meters…But is there anything there in these charts ? Doesn’t look like it.

    In fact I can’t say whether I’m looking at temp data from New Zealand or Australia…

  2. E.M.Smith says:

    I suspect some of the Africa losses is the end of the British Empire. A decade or three after self rule tending all the thermometers and reporting the data become less important than who’s rice bowl to steal and what farm to raid… See the state of Rhodesia / Zimbabwe and South Africa today…

  3. Steven Fraser says:

    E.M.: station dropout could be an artifact of station selection. If there was a list of those dropped out, it might be researched.

    One place that could be compared is BEST. Another might be the WMO. I would be very surprised if there were not at least 1 thermometer (airport, anyone) in each country…

    It would be fun to participate in such a study, perhaps reminiscent of Anthony Watts station study.

  4. Eilert says:

    I see many of the stations in Africa have been dropped.
    E.M you say that maybe due to the drop after the end of colonialism.
    That maybe partially so, but if you go to Weather Underground or similar sites you will find that many stations are reporting still daily.

  5. E.M.Smith says:

    @Eilert & Steven Fraser:

    There IS a large selection bias in the choice of when stations are in the record. It is NOT all available, it is a selected set chosen by NOAA/NCDC/whatever they call themselves now…

    I’m just guessing that the decay of local governance would cause a bias against them likely on the question of data quality and consistency.

    BEST, near as I can tell, is just a very much more severely chopped and molded GHCN.

  6. ossqss says:

    @Eilert, believe you will find Weather Underground stations are personal weather stations reporting to them.

  7. E.M.Smith says:

    FWIW, just as a place to note it, I’ve installed MariaDB on the RockPro64 to do some comparative speed testing. So far, on the “create the basic tables” step, it is very fast:

    Server version: 10.1.38-MariaDB-0ubuntu0.18.04.1 Ubuntu 18.04
    
    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    MariaDB [(none)]> source SQL/tables/temps4
    ERROR 1046 (3D000) at line 1 in file: 'SQL/tables/temps4': No database selected
    MariaDB [(none)]> use temps
    Database changed
    MariaDB [temps]> source SQL/tables/temps4 
    Query OK, 0 rows affected (0.52 sec)
    
    MariaDB [temps]> source SQL/tables/invent4
    Query OK, 0 rows affected (0.06 sec)
    
    MariaDB [temps]> source SQL/tables/continents
    ERROR: Failed to open file 'SQL/tables/continents', error: 2
    MariaDB [temps]> source SQL/tables/continent
    Query OK, 0 rows affected (0.07 sec)
    
    MariaDB [temps]> source SQL/tables/countries
    ERROR: Failed to open file 'SQL/tables/countries', error: 2
    MariaDB [temps]> source SQL/tables/country
    Query OK, 0 rows affected (0.09 sec)
    
    MariaDB [temps]> source SQL/tables/anom3
    Query OK, 0 rows affected (0.06 sec)
    
    MariaDB [temps]> source SQL/tables/anom4
    Query OK, 0 rows affected (0.06 sec)
    
    MariaDB [temps]> source SQL/tables/invent3
    Query OK, 0 rows affected (0.07 sec)
    
    MariaDB [temps]> source SQL/tables/mstats3
    Query OK, 0 rows affected (0.08 sec)
    
    MariaDB [temps]> source SQL/tables/mstats4
    Query OK, 0 rows affected (0.10 sec)
    
    MariaDB [temps]> source SQL/tables/temps3
    Query OK, 0 rows affected (0.10 sec)
    

    So as a comparative sample it’s about 1/10th to 1/20 th of a second…

    Loading the data takes longer… but for the “country” file not so much…

    MariaDB [temps]> source bin/LC4p.sql
    Query OK, 240 rows affected, 1 warning (0.07 sec)    
    Records: 240  Deleted: 0  Skipped: 0  Warnings: 1
    
    MariaDB [temps]> drop table country ;
    Query OK, 0 rows affected (0.03 sec)
    
    MariaDB [temps]> source tables/country 
    Query OK, 0 rows affected (0.06 sec)
    
    MariaDB [temps]> source bin/LC4usc.sql
    Query OK, 240 rows affected, 1 warning (0.06 sec)    
    Records: 240  Deleted: 0  Skipped: 0  Warnings: 1
    

    The LC4usc.sql version is loading data from the uSC card into the database on the card, while the LC4p.sql version is reading it from a USB 3.0 hard disk. The uSC is slightly faster, but I don’t know if that’s significant…

    Loading the whole v4 temperatures database from uSC to uSC:

    MariaDB [temps]> source bin/temps4uscL.sql
    Query OK, 16809258 rows affected, 50 warnings (18 min 17.22 sec)
    Records: 16809258  Deleted: 0  Skipped: 0  Warnings: 50
    

    Then doing it from USB 3.0 disk to uSD card database:

    MariaDB [temps]> drop table temps4;
    Query OK, 0 rows affected (0.69 sec)
    
    MariaDB [temps]> source tables/temps4
    Query OK, 0 rows affected (0.16 sec)
    
    MariaDB [temps]> source bin/temps4pL.sql
    Query OK, 16809258 rows affected, 50 warnings (15 min 27.88 sec)
    Records: 16809258  Deleted: 0  Skipped: 0  Warnings: 50
    

    Remember that from here:
    https://chiefio.wordpress.com/2019/03/18/ghcn-v4-first-peek/

    We saw the load into a USB 2.0 based disk from a different USB 2.0 based disk was only 22 minutes.

    MariaDB [temps]> source bin/temps4L.sql
    Query OK, 16809258 rows affected, 50 warnings (22 min 20.32 sec)
    Records: 16809258  Deleted: 0  Skipped: 0  Warnings: 50
    

    So the speed penalty for using a R.Pi M3 isn’t that large. the RockPro64 has most of the CPUs running at single digit to low double digit %load and often only one of the slower ones over 50% load. There are sporadic “disk wait” D markers in the process list. From this I conclude that the database load process is all dominated by disk speeds and uSD write speeds (depending on source and sink) with the Pi M3 being CPU limited (as one core pegs at 100%) but that only adding 4 minutes (or about 20%).

    That may change when it comes to reports with lots of math, but for the data load, a very hot machine doesn’t add as much as fast disk.

    The the “from USB 3.0 to uSD” speed is 15 minutes. Even faster, so clearly some contention issues with both reading / writing uSD to move the data.

    I suspect it would be faster still from USB 3.0 disk to USB 3.0 disk.

    At the 15 minute vs 22 minute there’s a significant advantage of 7 minutes or 50% of the fast speed, 1/3 of the slower one. Sill enough gain to really matter.

    Some other day I’ll do speed tests with the database on the USB 3.0 disk. For now I just wanted a quick test of likely worst case. As uSD is far slower to write than to read, I expect it will show up faster in the read-the-database reporting bits.

    I’ve got “stuff to do” for a while, but tonight or tomorrow I ought to be able to do some more comparisons. Likely in a separate article…

  8. ossqss says:

    @EM, are all the data points real stations in this data set or are some created with 1,200 km smoothing, extrapolation or Kriging?

  9. E.M.Smith says:

    @Ossqss:

    Nobody knows.

    I’m using the “unadjusted” GHCN but they go out of their way to state that they just pick up the data from various BOMs in countries all over and they might have adjusted it… So to answer your question would require auditing 150+ country BOMs and getting them to explain how they made the numbers. Nobody has done that.

    In fact, my suspicion is that the Australian / Pacific data have “moved” partly due to the Australian / Pacific BOM folks deciding to “tidy up” that messy old data with modern techniques… So homgenize, infill, etc. etc.

    That’s why I’ve hung on to the v1, v2, and v3 data. So I can do direct Apple to Apple comparisons of the very same station in the very same year. Then if the “unadjusted” data changed, someone adjusted it…

    That’s likely still a couple of months away. I’m still doing aggregates and only on 2 out of 4 versions (and them with disjoint table designs so some integration work still needed).

  10. Graeme No.3 says:

    O/T and of little use but Tallbloke has a list of UK Stations on his website. Up top of home.

  11. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Comments are closed.