I’m doing a quick comparison of a few GHCN v3.3 vs v4 countries with the v4 statistics and anomalies computed using a “cut off” of before 2016. This will have both using the same time period to compute their anomalies (though different instruments in some cases – mostly The USA and Germany have added data).
You can think of this as a “Baseline Period” that runs from the start of data through 2015.
MariaDB [temps]> SELECT MAX(year) FROM temps3; +-----------+ | MAX(year) | +-----------+ | 2015 | +-----------+ 1 row in set (16.51 sec)
Why? Because using all the v4 duration has 2016, 2017 & 2018 data used to make the average used to compute anomalies. To the extent they are hotter data points, it can make it look like the “past was cooled” as the average was higher. So this is to gauge how much of an issue that might be.
Overall, the graphs ought not change much. The average temperature in June in NYC Central Park ought to be a fairly stable number.
But we’ll see.
So I’ve added three tables. These first two are identical to the anom4 and mstats4 tables other than the name of the tables being mstatsS4 and anomS4 where the “S” is for “Short”.
Loading them was a bit slower due to an added test for year, but still not too bad on a Raspberry Pi M3:
MariaDB [temps]> source tables/anomS4 Query OK, 0 rows affected (0.76 sec) MariaDB [temps]> source tables/mstatsS4 Query OK, 0 rows affected (0.08 sec) MariaDB [temps]> source bin/LOAD/LmstatsS4 Query OK, 328326 rows affected (29 min 12.35 sec) Records: 328326 Duplicates: 0 Warnings: 0 Empty set (0.00 sec) MariaDB [temps]> source tables/mkanomindexS4 Query OK, 0 rows affected (0.41 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.08 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.06 sec) Records: 0 Duplicates: 0 Warnings: 0 MariaDB [temps]> source bin/LOAD/LanomS4 Query OK, 15450484 rows affected (29 min 20.49 sec) Records: 15450484 Duplicates: 0 Warnings: 0 Empty set (0.03 sec) MariaDB [temps]>
There’s the one that computes the statistics. I’ve bolded the change:
chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat LmstatsS4 INSERT INTO mstatsS4 (stnID,month,mean,big,small,num,trang,stdev) SELECT stnID,month, ROUND(AVG(deg_C),2),MAX(deg_C),MIN(deg_C),COUNT(deg_C), MAX(deg_C)-MIN(deg_C), ROUND(STDDEV(deg_C),2) FROM temps4 WHERE deg_C>-90.0 AND deg_C< 60.0 AND year<2016 GROUP BY stnID,month; show warnings;
Then computing the anomalies is the same, just to a different storage table:
chiefio@PiM3Devuan2:~/SQL/bin/LOAD$ cat LanomS4 INSERT INTO anomS4 (stnID,abrev,region,cnum,year,month,deg_C) SELECT T.stnID,T.abrev,C.region,C.cnum,T.year,T.month,ROUND(T.deg_C-ST.mean,2) FROM temps4 AS T INNER JOIN country AS C ON T.abrev=C.abrev INNER JOIN mstatsS4 AS ST ON ST.stnID=T.stnID AND ST.month=T.month WHERE T.deg_C > -90.0 AND T.deg_C < 60.0 ; show warnings; chiefio@PiM3Devuan2:~/SQL/bin/LOAD$
Then there’s the need to make a modified yrcastats table. I called it “yrcastatsS” and it is identical in layout. Only the loading of the data comes from the anom4S table instead.
Then you can run the graphs:
This involves just pointing the reports at anom4S and yrcastatsS tables and saying RUN.
I chose these guys as a small country unlikely to have a lot of thermometers, so more likely to have any change in the final years data show up in an “apples to apples” comparison. That is, looking at the same thermometers. There’s ALMOST no detectable difference in the graphs. Just enough to know it isn’t an error on my part and I ran the same one. A couple of spots that are on top of each other as almost one spot on one, but a smear on the other. Some slightly different spacing in a group.
In particular, look at the far right near the bottom. There are three dots in a tight cluster triangle. All three sort of touch in the lower graph (original) and have separation if only barely in the short graph with 2015 as the last date for anomaly computation. (the top graph). It looks to me like a few hundredths of a degree C at most.
I’m still going to try a few more and see if I can find one more dramatic. Then, these are so close I’m going back through the whole process end to end to make sure it is doing what it ought to do ;-)
So all that gives me confidence that my conclusions on all the other graphs don’t need a rewrite due to length of data in the average used for the anomaly computation being different between v3.3 and v4. It does make a difference, but it is so small as to almost undetectable in the graphs.
I’m just going to do the Germany Difference graph. Germany and the USA added a lot of thermometers, so it’s a good test case for the impact of both difference in years in baseline and the impact of instruments added on this particular chart. The last couple of dots are easier to see some shift. Most of it is pretty static.
At the very recent end of the plot there’s a triangle of 3 dots. Look carefully and you can see they are slightly different placement. The top dot in the top graph is slightly higher.
It does look like the method I used is fairly stable and resistant to the “attack” based on saying v4 used extra years in that “baseline” than did v3.3 data. It makes a difference, but not one that changes the conclusions (or can even be found most of the time).
With that, I’m comfortable saying the prior comments on countries is fairly reliable and when there is “cooling of the past” it is NOT due to the length of data over which the monthly average temperatures are computed for making the anomalies.
Let the whooping and hollering begin! ;-)