GHCN v3 vs v4 All Regions Graphs

Here are the 7 “regions” or roughly continents, as graphs that compare GHCN version 3.3 to version 4. There is one significant quirk in this comparison in that both Russia and Kazakhstan were split between Europe and Asia in v3 but are now lumped into Asia since the version 4 data are not arranged quite the same way. They are arranged by country, not by region.

Now the major point I’d make here is that we are regularly harangued that since it is all done with anomalies the particular thermometers used doesn’t make any difference. Well, then, OK, sauce for the goose is sauce for the gander, so since these are done with anomalies “it doesn’t matter”…

Region 1 - Africa GHCN v3.3 vs v4 Anomalies

Region 1 – Africa GHCN v3.3 vs v4 Anomalies

Region 2 - Asia GHCN v3.3 vs v4 Anomalies

Region 2 – Asia GHCN v3.3 vs v4 Anomalies

Region 3 - South America GHCN v3.3 vs v4 Anomalies

Region 3 – South America GHCN v3.3 vs v4 Anomalies

Region 4 - North America GHCN v3.3 vs v4 Anomalies

Region 4 – North America GHCN v3.3 vs v4 Anomalies

Region 5 - Australia / Pacific Islands GHCN v3.3 vs v4 Anomalies

Region 5 – Australia / Pacific Islands GHCN v3.3 vs v4 Anomalies

Region 6 - Europe GHCN v3.3 vs v4 Anomalies

Region 6 – Europe GHCN v3.3 vs v4 Anomalies

Region 7 - Antarctica GHCN v3.3 vs v4 Anomalies

Region 7 – Antarctica GHCN v3.3 vs v4 Anomalies

Make of them what you will. “The data just are. -E.M.Smith”

What I find most striking is just how different the various charts are in their overall size, dispersion, and ‘shape’ of the data. Followed closely by the way they all tend to narrow at the “baseline” period (roughly 1950 to 1990) then (all but Antarctica) have a rapid rise at the last 20 years of data.

Now were the CO2 theory of causality to be in these data, ought not the dispersion to be less variable? Ought not the “rise” have started back about 1950 instead of 40 years later (and just when all the instruments were being changed to rapid response electronic devices on short cables to buildings and / or at airports filled with tarmac and jet exhaust…)

Overall, the data just look rather “manicured”.

Then there is the simple fact that they are DIFFERENT. These are both the “unadjusted” data sets. Especially in the deeper past where there was only one or a half dozen thermometers, there is no reason what so ever for the dots to be different. They are, in theory at least, the SAME data from the SAME instruments. Something in this “unadjusted” data looks very adjusted.

Subscribe to feed


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, NCDC - GHCN Issues. Bookmark the permalink.

17 Responses to GHCN v3 vs v4 All Regions Graphs

  1. A C Osborn says:

    As some of have commented on your other posts, they have been shown to have been adjusted and are continuing to be adjusted.
    Raw data is no longer RAW.

  2. Bill in Oz says:

    I assume that by ‘unadjusted’ you mean not homogenised ?

  3. Bill in Oz says:

    E M the only way I can check these charts, is against what I know : the weather record for Australia over the past 170 years…

    And that immediately shows an anomaly : the period from 1895-1903 was dry and hot : It was one of the longest droughts in Australian history and even has it’s own name – The Federation Drought as it occurred immediately before and after the establishment of the Commonwealth of Australia in 1900.

    But when I look at the chart for Australia and the adjoining pacific region, the Federation Drought is not apparent.

    What is going on ?
    Ditto there was a major drought here in Oz from 2006-2011..which is now named the Millennial drought…But the chart doe snot highlight that either.

    Am I missing something here ?

  4. Hifast says:

    Thank you E.M. for the diligent work.
    However, it’s difficult for me to visualize any patterns or make conclusions.
    What is the delta between the v3 and v4 the anomalies? That is, does v4 trend one way or the other (cooling or warming) relative to v3?

  5. E.M.Smith says:

    @Bill in Oz:

    “Unadjusted” means whatever NOAA says it means. I think it means no TOBS, UHI, and similar processes they apply; BUT can be adjusted by the supplying national BOM and may have some various “corrections” in it. Does it include homogenizing? Maybe…

  6. E.M.Smith says:


    I’m not yet skilled enough with Python to put trend lines on things. Anyone who knows how is welcome to post pointers to code howtos… I can do stacked scatter plots and I’m working on learning other graph types.

    For these, look at the offset between red and black in each year and how it changes. Or just that it is there at all in the same place. It says “anomaly” can’t protect you from intrument change artifacts.

  7. jim2 says:

    A trend line is basically just a linear regression:

  8. Steven Fraser says:

    To start further analysis, how about the numeric values for a single station, with a calculation of the difference between the v3 and v4?

  9. E.M.Smith says:

    @Steven Fraser:

    Strange you should suggest that… it’s what I was working on just before doing the blog check-in just now ;-)

    I’m “having issues” still… Here’s my present code:

    chiefio@PiM3Devuan2:~/SQL/bin$ cat delt.sql 
    SELECT T.abrev,A.year,AVG(T.deg_c)-AVG(A.deg_c) 
    FROM anom4 AS A 
    LEFT JOIN anom3 AS T ON A.abrev=T.abrev 
    WHERE A.abrev='LT' 
    GROUP BY year

    LT is Lesotho and one of the smaller sets of data.

    Issue 1): Run on the USA it ran 24 hours and still didn’t complete. Something isn’t very efficient…

    Issue 2): In years without data in one of the data sets, you get the actual value of the average of the other data set (i.e. not skipping those with zero data). I can skip one side with an “INNER JOIN” instead of “LEFT JOIN” but not both. I’m sure there’s a way, but needs a dig…

    In this table, only 1971 to 1980 have data in GHCN v3.3, the rest are the actual V4 averages, NOT the difference. (Read off of the graph comparing the two sets of dots as a QA check)

    MariaDB [temps]> source bin/delt.sql
    | abrev | year | AVG(T.deg_c)-AVG(A.deg_c) |
    | LT    | 1922 |                  0.037906 |
    | LT    | 1923 |                  0.226239 |
    | LT    | 1924 |                  1.231239 |
    | LT    | 1925 |                  0.863739 |
    | LT    | 1926 |                  0.207906 |
    | LT    | 1927 |                  0.239573 |
    | LT    | 1928 |                  0.500406 |
    | LT    | 1929 |                  0.012906 |
    | LT    | 1930 |                  0.613739 |
    | LT    | 1931 |                  0.331239 |
    | LT    | 1932 |                 -0.182094 |
    | LT    | 1933 |                 -0.049594 |
    | LT    | 1934 |                  0.503739 |
    | LT    | 1935 |                  0.622073 |
    | LT    | 1936 |                  0.113739 |
    | LT    | 1937 |                 -0.312094 |
    | LT    | 1938 |                 -0.457927 |
    | LT    | 1939 |                  0.140406 |
    | LT    | 1941 |                 -0.172927 |
    | LT    | 1943 |                  0.793739 |
    | LT    | 1944 |                 -0.351336 |
    | LT    | 1945 |                 -0.792094 |
    | LT    | 1946 |                 -0.527927 |
    | LT    | 1947 |                 -0.770427 |
    | LT    | 1949 |                 -0.895427 |
    | LT    | 1969 |                 -1.057927 |
    | LT    | 1970 |                 -0.986261 |
    | LT    | 1971 |                 -0.216346 |
    | LT    | 1972 |                  0.021001 |
    | LT    | 1973 |                  0.175869 |
    | LT    | 1974 |                  0.382204 |
    | LT    | 1975 |                  0.113402 |
    | LT    | 1976 |                  0.476001 |
    | LT    | 1977 |                 -0.383761 |
    | LT    | 1978 |                  0.311484 |
    | LT    | 1979 |                 -0.150820 |
    | LT    | 1980 |                  0.402676 |
    | LT    | 1981 |                  0.713079 |
    | LT    | 1982 |                 -0.244346 |
    | LT    | 1983 |                 -0.630698 |
    | LT    | 1984 |                 -0.207051 |
    | LT    | 1985 |                 -0.422999 |
    | LT    | 1986 |                 -0.285350 |
    | LT    | 1987 |                 -0.166302 |
    | LT    | 1988 |                  0.045395 |
    | LT    | 1989 |                  0.247350 |
    | LT    | 1990 |                  0.107573 |
    47 rows in set (2.34 sec)

    I’m thinking that to both make it efficient and get things more easily selected for “just years with data in both” if would be better to make a “yearly statistics” table (like the month stats table) and populate it with the averages for both data sets. Then it becomes one “calculate and load'” and after that just SELECT for not NULL…

    Means another table and load, but would be faster than an all night run for the USA… that might even take longer than that as I killed it instead of waiting for completion… (And yes, I have an index on abrev)

    I do have an interesting set of graphs “by Country” stacking up for a posting later in the day, but was hoping to get this “enhancement” to go with them. We’ll see… I’m still at the 1st Cup Of Coffee stage ;-)

  10. E.M.Smith says:


    Thanks for the regression pointers… Looking it over, it looks like a 2nd Cup Of Coffee task ;-)

  11. Steven Fraser says:

    @EM: IMO not strange at all. Its very much the way I explore data as well.

  12. Steven Fraser says:

    @EM: And, someone is going to comment about the number of sig digits soon….

  13. E.M.Smith says:

    @Steven Fraser:

    I was taught to carry the best precision to the end, then truncate or round to inside the error bars…

    From the “Sometimes you just can’t win” department…

    So I made a new yearly anomaly statistics table for both v3 and v4 data and a script to load the v4 (by year / country) then update the other v3 fields (as there are a LOT less of them)… Here’s the script:

    chiefio@PiM3Devuan2:~/SQL/bin$ cat Lyrcastats 
    INSERT INTO  yrcastats (year,abrev,mean4,big4,small4,
    SELECT year,abrev,
    MAX(deg_C)-MIN(deg_C), STDDEV(deg_C)
    FROM anom4 
    GROUP BY year,abrev
    show warnings;
    UPDATE  yrcastats AS Y
    mean3 = (SELECT AVG(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev),
    big3 = (SELECT MAX(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev),
    small3 = (SELECT MIN(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev),
    num3 = (SELECT COUNT(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev),
    trang3 = (SELECT MAX(A.deg_C)-MIN(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev),
    stdev3 = (SELECT STDDEV(A.deg_C) 
    FROM anom3 AS A
    WHERE Y.year=A.year AND Y.abrev=A.abrev)
    show warnings;

    The first part, loading the initial records of v4 data completed in about 4 minutes:

    MariaDB [temps]> source bin/Lyrcastats
    Query OK, 28423 rows affected, 5 warnings (4 min 7.05 sec)
    Records: 28423  Duplicates: 0  Warnings: 5

    Then it launched into the update… about 4 hours ago… and one CPU core as been pegged at 100% ever since… and it still isn’t done…

    It looks like “UPDATE” is very inefficient… There’s gotta be better way…

  14. jim2 says:

    CIO – here’s 8 ways to Sunday to do LR in Python. One or two look pretty simple. With some of the others, you can fit a curve to your data.

  15. E.M.Smith says:


    Thanks for the added pointers!

    Looks like I’m about ready to “pick up where I left off” on improving the v3 v4 comparison graphs. The data load / stats creation finally finished:

    MariaDB [temps]> source bin/Lyrcastats
    Query OK, 28423 rows affected, 5 warnings (4 min 7.05 sec)
    Records: 28423  Duplicates: 0  Warnings: 5
    Query OK, 23987 rows affected, 17744 warnings (4 hours 20 min 58.52 sec)
    Rows matched: 28423  Changed: 23987  Warnings: 17744
    MariaDB [temps]>

    So about 4 1/2 hours all told. Not going to do that real often….

    Now I can get back to graphing it, seeing if it “talks to me” as I’d like, and then do things like fit a curve / LR / etc… I think I have a full evening ahead…

  16. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Anything to say?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.