This posting is just for the purpose of making a top level entry point to the various postings about the comparison of GHCN version 3.3 to version 4. It is, in essence, a set of links to prior articles.
I’ve taken “My own sweet time” to make this “aggregator” posting, but in looking just at the number of links, I can see why I was feeling a bit “burnt out” on the topic / efforts. That’s one heck of a lot of work and learning embodied in all those links!
Well, the good news is I’m getting over it now. Enough that I was ready to dig back through it all and collect all the “stuff” in one easy to find posting. With this posting, I’m almost ready to take on adding version 2 to the mix ;-)
That will require adding another set of v2 tables to all this below, and would likely also be a good time to gather all the “how to build” stuff from scattered through these postings (and in some cases in comments in the postings) into one neat “How To Build” posting with scripts. If anyone else is thinking of building one of these systems and wants a single posting “how to”, post a comment to that effect and I’ll bring it all together (and likely add scripts for chunks of it).
All of these anaysis postings, along with a lot more related to GHCN, can be found in the category:
I started the comparison of v3.3 to v4 investigation by looking at some regions in total (the “continents” just below) and then at some selected countries just to get a feel for what was coming. To assure I was looking at things the right way and with tools that worked. If you don’t want to see those 2 first steps, just skip on down to “Around The World” for every country in the world represented in GHCN. Here are those two postings:
Around The World
These are the collections of graphs comparing v3.3 to v4 in all the various countries in the world represented in the GHCN Global Historical Climate Network data sets. They are not presented in the order created (shortest to longest list of countries) but rather in the order of the continents as numbered by GHCN.
Africa – 1
Asia – 2
South America – 3 with Antarctica – 7
North America – 4
Australia & Pacific Islands – 5
Europe – 6
Antarctica – 7
There are no countires in Antarctica, just one continent graph, so I put that at the end of South America (see just above).
QA and Technical Housekeeping
In this posting, I test the sensitivity to change of the final date used for computing “anomalies”, using a “baseline” period that ends in 2015, the same year ghcn v3.3 ends, and not using all the data to the end of v4. This assures only the same time periods are used for computing differences. An important sanitation measure that shows it is NOT the inclusion of 3 more years of data in v4 causing the changes in the comparisons.
Some general complaints about the kinds of change made from data set version to version, and how they make effective cross version comparisons deliberately and unnecessarily difficult.
At one point I attempted to port the system to an Odroid N2 (which was just arrived on the market as a new system with barely ported operating system). This worked well right up to the “plot the data” step when the Python Matplotlib had a bug in it that screwed up headings. Here’s the story of the process / experiment:
That’s why a lot of these graphs and this process were done on much slower boards with more stable operating systems. Having things work right often means using systems of hardware and software that are more fully debugged and NOT the “latest and greatest”.
Along the way, this also showed that while the faster computers were significantly faster, even a Raspberry Pi Model 3 was sufficient to do this work. (Just have a box of candy bars and extra coffee available for the waits ;-)
Close Ups & Odd Bits
Here I’m putting links to minor investigations I did, or things I looked at “up close” along the way. Things that might need similar investigation for far more parts of the world.
How does the inventory of thermometers change over time? In nice globe graphs:
What is the global distribution of thermometers in the GHCN v4 set NOW as compared to the time window used by GIStemp for computing a “Baseline”?
How did the “high altitude” stations represented change over time? This matters rather a lot since, as we are experiencing now, when the Sun has a major quite time, the UV drops, total atmospheric height shrinks, and all the “high cold places” end up at, effectively, higher and colder density altitudes (i.e. thinner air). Change when they are in vs out of the data set, you are indirectly adding / removing those solar changes.
Peculiar things about Djibouti. Just why would the SAME historical data from the ONE instrument, change? Eh?
I’d often rejected the idea that “The Anomaly Fixes Instrument Change”. Using this database I was finally able to assess that belief. I found that the use of anomaly processing is NOT sufficient to correct for instrument changes. As the entire GHCN is one giant mass of instrument changes, this means that there’s no validity to the data even when processed as anomalies, for saying anything about 1/10 C scale (or even whole C IMHO) “trends”. It could just as easily be an artifact of instrument changes and data series length changes.
When I first had this running, I did Australia Pacific Islands region first as a kind of test case. Here’s that posting:
The Code & Computer Stuff
There is a general introduction to this code, the database, and all here:
Various technical changes were made over the course of the investigation. Here’s the last iteration of the scripts to load the database:
The original database and related code install scripts and process:
Here is the original build of the statistics and anomaly tables:
Where I got the data, how to unpack it, and the first look at it:
The GHCN v3.3 Steps
This comparison of v3.3 to v4 started with getting v3.3 loaded and kicking it around a little. Could not compare v3.3 to v4 until they were both loaded, so had to start somewhere. I started on v3.3, and here are those 3.3 only postings:
I’d compared the seasons in Australia, then had a request to do a comparison of the seasons using the months as defined in Oz for each season:
Here’s the various continents with standard seasons used on all of them:
Do thermometer anomalies by season by continent tell us anything interesting? I think so… Graphs of same and conclusions that altitude in winter and asphalt in summer sun matter to volatility
Anomalies as computed and graphed, by Continent in v3.3:
GHCN v3.3 thermometer change over time, plotted on a global graph:
Comparing the altitude of stations overall, with the altitudes used in the GIStemp baseline time period. Let’s just say the are not really comparable…
How long do thermometers “live” in the data set? (It would be very interesting to compare this with how long they exist in the real world…)
Where are the thermometers NOW in v3.3 vs where where they during the Hadley baseline period? Global graph:
What might be the impact of particular months on how the anomaly looks? What do anomalies look like, month by month, and does this have implications for volatility of data and what months might have more data dropped for being “out of range” if you use a “one size fits all” volatility screen?
How does Station at Altitude change over time?
A close up look at where thermometers are, by continent, in the baseline period vs now:
Where are all the thermometers? Why, where all the people are (and all their industrial and household and transport and asphalt heat are located):
I take a look at ALL the data for Australia as a “scatter plot”:
Scatter Plots for all the continents:
A close up look at San Francisco that asks if maybe, just maybe a 5 C change is not due to CO2 but “something else”? Like maybe that giant airport expansion for the Jet Age?
The first GHCN sin based globe I plottted with labels:
The earlier attempt with a rectangular globe projection:
Then there are the Pre-Plotting Postings
Version 3.3 Related Technical Bits
A crude look at how “country code” changes between versions as I tried to figure out how to work around that:
My first step building of a MySQL database. (Later changed to MariaDB way up above).