GHCN v4 – First Graphs

I’ve reached the point where I can start looking over the GHCN v4 data.

The process of getting it loaded and some degree of ability to compare it with prior v2 & v3 versions is in a posting here:

While the equivalent v3 graphs were produced in an earlier posting. I’ll include one of them here for comparison.

Without further ado, here’s the graphs of “baseline” vs “now”:


Ive used the entire 1950 to 1990 inclusive period as the “baseline” interval since that covers both Hadley and GISS and the v3.3 data seemed to cover that whole period with extra thermometers too. Plotting “now” vs “baseline” on one graph is a bit muddied (as you will see below) so I broke them out.

GHCN v4 Baseline of 1950 to 1990 inclusive

GHCN v4 Baseline of 1950 to 1990 inclusive

“Now” 2018

GHCN v4 sites in 2018 "now"

GHCN v4 sites in 2018 “now”

To me, it looks like an improvement, but the same problems still exist. MORE coverage in the “baseline” than in the present. Oceans still largely unknown. LOTS of instrument changes (many areas show different shapes and densities). Lots of empty places stay empty (Sahara, northern Canada).

It is as though the same methods and games are being applied, but with a lot more “Go” pieces on the board. (I like “Go” more than chess. Different, but more. To understand the Chinese approach to strategy, play Go for a few months…)

Baseline over “Now” 2015

When I first modified the v3.3 code I forgot to make the “now” 2018, so this graph still uses 2015. It ought not make much difference for this graph.

GHCN v4 Baseline Blue over 2015 "now" Red.

GHCN v4 Baseline Blue over 2015 “now” Red.

GHCN v3.2 Baseline blue 50% transparent over 2015 "now" in red

GHCN v3.2 Baseline blue 50% transparent over 2015 “now” in red

Not a lot to say at this point. This is just the “First Fire” jumping off point.

I was very happy to see that the codes I’d written for V3 worked fine with V4 with only minor changes (like different table names) though slower due to the roughly 4 x more stations providing data.

These maps show that they still could not change the reality that the USA and central Europe have most of the historical temperature data. The geographical limits of what is known still apply. Especially in the past.

There is much more coverage in this set, but still with holes in lots of places and with “instrument change” still running all through it. (Not much you can do to change the fact that people bought new instruments in decades past, wars were fought, and things moved around).

Overall, my first blush of it is that it is an improvement. Over the next days (weeks?) I’ll poke and prod it some more and see if that holds.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, NCDC - GHCN Issues and tagged , , , . Bookmark the permalink.

17 Responses to GHCN v4 – First Graphs

  1. Bill in Oz says:

    E M, the size of the points on the charts is deceptive.
    It appears to show comprehensive coverage of many parts of the globe. But I know here in Oz that many weather stations are hundreds of kilometers away from each other. And even those reasonably close to each other can be very different i their results because of altitude or distance from the seas.

    Bigger charts or small tiny dots ? Which is better ?

  2. A C Osborn says:

    E M, do you know if all the Stations in the database are actually used in the Global calculations?
    Have you looked at any actual Site data yet, I do not mean in depth because you obviously have not had time, because the last time I looked it confirmed Tony Heller’s claim that a lot of the data is Marked “E” for Estimated, even historical data has been replaced with “E” data.

  3. Bob Koss says:

    A C
    GHCNM v3 or v4 doesn’t use estimated data. It is only found in the USHCN database. The ‘E’ QC flag in v4 indicates those months are duplicated in another stations record.

    I guess I should note I put a comment in this post with more info.

  4. E.M.Smith says:

    @Bill In Oz:

    So far I’ve not found any way to change the size of the graph. (It may well exist but Python is such a tangle of functions with hidden behaviours and no manual that every “feature” is a few hours to find / use the first time. Welcome to OO…)

    As for dot size, much smaller and you can’t see the singletons, especially in colors like green or yellow.

    I’m not really happy with it, but don’t have a fix (yet).

    @A. C. Osborn:

    Each group doing an “analysis” does their own data selection. All of them cut off about 1850-1880 going back in time. GIStemp homogenizes things all over the place. No idea what Hadley does.

    I just got the temperature data for v4 loaded yesterday so no, not looked at any of the metadata stuff.

    THE basic trend I’ve seen from start (v1) to end (now – v4) is the incessant use of ever more non-data instead of actual data. At every step, the actual readings are replaced. From “QA” processes to “Adjustments” to “Homogenizing” to “Reference Station Method” making “grid box anomalies” for comparison rather than actual thermometers.

    After pointing out the crap homogenizing in GIStemp, it was moved up stream into the GHCN directly (where the process is hidden from view…). So now you don’t get a minor station number showing a station change happened. It has all be spliced, smoothed, and hidden now.

    IMHO, that’s the biggest problem with GHCN v4. Even the “unadjusted” data is highly adjusted. It comes from various BOMs al over the world and the description specifically disclaims that unadjusted is really unadjusted as the upstream may well have already adjusted it. Then it gets spiced, homogenized, etc. etc. God only knows what.

    This comparing v4 to v3 to v2 showing things changing…

    @Bob Koss:

    I’ve got a handle on the first 2 CHAR of Station ID (country) and the 5 of WMO base, but the 4 in the middle obscure. In particular, the documentation just calls it 11 digits of ID. But then talks about 3 CHAR columns of flags – that might or might not be the next 3 after the country code. Assuming they are something like that, it leaves that 4 unknown (the 6th actual character) completely undefined. So is it a 6th WMO digit? A flag? Who knows…

    That’s the biggest thing I’m trying to work out. Pointers to some documentation better than the included README file strongly appreciated….

  5. E.M.Smith says:

    @Bob Koss:

    Per the metadata description file (I have a copy here: ) there is Estimated “data”:

    DMFLAG: data measurement flag, nine possible values for QCU/QCF and one for QFE:

    Quality Controlled Unadj/Adj (QCU/QCF) Flags:

    blank = no measurement information applicable
    a-i = number of days missing in calculation of monthly mean
    temperature (currently only applies to the 1218 USHCN
    V2 stations included within GHCNM)

    Quality Controlled Adj and estimated (QFE) Flags:
    E = Data has been estimated from neighbors within the PHA

    I don’t yet know how much, but that Data Missing Flag is the one where E means estimated.

  6. E.M.Smith says:

    Though either I’ve done something wrong with my query / process or there are no actual values in the dataset:

    MariaDB [temps]> SELECT COUNT(deg_C) FROM temps4 
        -> WHERE deg_C > -90 AND missing='E';
    | COUNT(deg_C) |
    |            0 |
    1 row in set (1 min 39.83 sec)
  7. E.M.Smith says:

    Hmmm….. Looks like the “missing data flag” has no entries. I’ll want to check that in the source data and assure I didn’t just muff the load, but it is possible that flag is not set yet in v4 first release:

    MariaDB [temps]> SELECT missing FROM temps4
        -> WHERE missingNULL;
    Empty set (1 min 28.58 sec)
    MariaDB [temps]> SELECT COUNT(missing) FROM temps4
        -> WHERE missing' ';
    | COUNT(missing) |
    |              0 |
    1 row in set (1 min 29.21 sec)
    MariaDB [temps]>
  8. E.M.Smith says:

    I did a quick “eyeball” of the data as unpacked from the tape. LOTS of E in the QC Flag (2nd position) but no E in the missing field. While I’ve not looked at all the data, I’ve looked at dozens and dozens of pages of it. I suspect the “missing” field is either obsoleted by their homogenizing method or just not implemented yet.

  9. You have probably addressed this in prior posts in the GHCN series, but is it possible to show data back to the beginning of the 20th century? Early 20th century warming somewhat puts more recent warming (and cooling) in perspective when you start running temp variations against time.

    (So long as you are able to avoid the clearly Anthropogenic warming trends. Those clearly man-made temp increases are the ones made thru adjustments to raw temp data.) ;-)

  10. Bob Koss says:

    E. M.

    That QFE file didn’t exist before around Mar 2018 in v4 and only covers 1961-2010 anomaly periods. Don’t know if they are currently making any use of it. I do know many of the stations found in it with estimated data have no real observational data at all 1961-2010, so all their data is estimated. The QFE file is the only one where that field is used.

    You won’t find estimated data in either the QCU or QCF files.

    You might want to look at my comment in your post from the 18th.

  11. Bill in Oz says:

    E M , I just found this article about New Zealand’s weather and SAM..The Southern hemisphere Vortex…Very interesting indeed.

    Curious that our informative BOM here in Oz has not told us anything about it.

  12. A C Osborn says:

    Bob Koss says: 21 March 2019 at 2:18 pm
    GHCNM v3 or v4 doesn’t use estimated data. It is only found in the USHCN database. The ‘E’ QC flag in v4 indicates those months are duplicated in another stations record.

    So GHCNM does not use USHCN data?

  13. cdquarles says:

    Hmm, I’d say that homogenization by any means must produce estimates of values where they represent values that don’t exist because they were not the measured results.

  14. Bob Koss says:

    A C
    Estimated values are not included in the v3 QCU/QCA or in the v4 QCU/QCF files. Only the observational values from USHCN are present.

    It’s possible the v4 QFE 1961-2010 anomaly period file, which is a new addition to v4, contains missing value estimates taken from USHCN instead of generating their own. I’ll check sometime over the next few days.

  15. E.M.Smith says:

    @Bob Koss:

    I read all comments when I check the blog (duty to assure nothing illegal..,) so had read your comment then, also posted the text from the dataset description stating there IS AN ESTIMATED FLAG column (thought it ought to be unused for the Unadjusted set), also posted sql reports showing it was empty, also posted my scan of the raw input file confirming that, also… so perhaps it is you who needs to read some prior comments?… Or maybe we are collectively just talking past each other?

    FWIW, during early data collection times it was permitted to “guess” the temperature. Back about 2009 I posted a link to an online official copy of the guidance to thermometer readers saying just that. So of course the Warmistas had the link promptly broken (and I learned of the necessity to grab screen shots of anything NASA NCDC etc.) so some of the data most certainly are estimates, flagged or not.

    Then in earlier GHCN versions there were lots of estimated flags in all sorts of years. That those are now gone does not encourage belief that somehow they found replacements of real readings spread over the last two centuries; rather it implies they are hiding their historical guessing… Or maybe they are now “missing data flags” or “homogenized so techy estimates not those old fashioned human estimates”.

    Who knows. (IMHO at this point nobody can know for sure, the data are so cocked up and molested).

  16. Bob Koss says:

    GHCN scans the USHCN raw file and removes any months with more than 5 missing observations. It plugs that data into its QCU raw database of all stations. Then for its QCF final database it proceeds to perform adjustments where it deems necessary. It does it this way because their adjustment scheme often relies on stations outside the USHCN network. Since they are starting with all original observations, they call them adjustments and not estimates. No ‘E’ inserted for the DMFLAG.

    For their new QCE file it appears they insert the QCF data unless missing. When missing, they estimate it. Probably in a manner similar to USHCN and then mark it with an ‘E’ for the DMFLAG. Those estimates don’t appear to be the same as in USHCN, probably due different data being used to create the estimate.

    Point of information. The QFE file has many stations which have no observational data at all during the 1961-2010 period and are entirely based on estimates.

    USHCN 52j final database fills in estimated values where they have no observational value at all. Even going so far as filling in entire decades with data.

  17. Pingback: GHCN v3.3 vs v4 – Top Level Entry Point | Musings from the Chiefio

Comments are closed.