GIStemp GHCN Selection Bias Measured 0.6 C

Sausage making, Thermometers, think about it

Don't worry, it's only a temperature sausage from GIStemp

Orginal image.

The Un-Discovered Country

In an earlier posting, California On The Beach, we saw that there were significant thermometer deletions, in the USA in particular.

Many of these could be tracked to a conversion of the USHCN data input file to a new format USHCN.v2, (while GIStemp did not have the maintenance programming work done to use the new format). USA data from USHCN cuts off in 2007. At about the same time, GHCN had a large reduction of thermometers as well. In the USA, this reduced the active thermometer count in the present to 136. Hardly representative.

So there was a dramatic “crash” of thermometer count.

I asserted “this matters”. And I can now put a number on it.

I’ve run a program to convert the USHCN.v2 file into a USHCN format that GIStemp can process. That program is listed here:

http://chiefio.wordpress.com/2009/11/06/ushcn-v2-gistemp-ghcn-what-will-it-take-to-fix-it/

The old deleted thermometers and the after putting them back in full temperature histories are listed below. I’ve also pasted in the console log from the run of STEP0 so you can see that it ran to completion normally (and produces terrible console logs…) There is also a brief wrap up after the data.

All that is just after the findings in the next section.

The Re-Discovered Country

After running that program I found there had been 59 new stations added (beyond the older 1000+ that were simply being ignored now):

[chiefio@tubularbells tmp]$ wc -l USHCNv2.Adds 
     59 USHCNv2.Adds
[chiefio@tubularbells tmp]$ 

Longer term, it will take a bit of work to go through those added stations and put updated entries for them into the needed tables for GIStemp (it dies if the entries don’t match). But knowing that these stations are brand new stations, and that GIStemp is going to toss them out for being under 20 years long in STEP2, there is another route to a benchmark.

I just removed those station records from the converted USHCN.v2 input file. Now the remaining data match the “station inventory” and the program runs to completion.

This ought to have nearly no effect on the benchmark after STEP2 (to be done a bit later) and only a small effect on this benchmark. Basically, it’s better to have put 1000 stations back in and be short a few, then to be short all of them.

UPDATE: I’ve added the USHCN.v2 inventory format entries for those “added” stations down at the bottom.

And what do we find? We find that the record for 2008 cools dramatically when you use all the thermometers.

There is a 0.6 C “Selection Bias” in the U.S.A. temperature record from deleting the USHCN thermometers in GIStemp

This selection bias measurement is for the U.S.A. data only (that is where USHCN covers). When averaged in with the rest of the world, that number will reduce. (Though there are also deletions in the rest of the world data. If all the deleted thermometers were put back in, one might well find a similar effect for the ROW – Rest Of the World.) To the extent that the ROW deletions are of similar pattern, this would be representative.

Take a look at the 2008 “yearly average” and “thermometer count” numbers in these two excerpts from the two runs of “old” and “new” USA data. Those are two fields on the far right.

This is the bottom part of the “before”. Run on my standard benchmark copy of the USHCN data:

Thermometer Records, Average of Monthly Data and Yearly Average
by Year Across Month, with a count of thermometer records in that year
--------------------------------------------------------------------------
YEAR  JAN  FEB  MAR  APR  MAY  JUN JULY  AUG SEPT  OCT  NOV  DEC  YR COUNT
--------------------------------------------------------------------------
2002  2.1  2.8  4.8 12.4 15.6 21.8 24.5 23.1 20.0 11.7  6.0  2.1 12.2 1421
2003 -0.1  0.5  6.6 11.6 16.4 20.4 24.0 23.8 18.5 13.5  6.7  1.9 12.0 1411
2004 -1.4  0.9  8.2 11.8 17.4 20.5 22.9 21.5 19.3 13.4  7.3  1.7 12.0 1381
2005  0.3  3.2  5.6 11.8 15.6 21.4 24.2 23.4 20.2 13.4  7.5  0.2 12.2 1213
2006  4.1  1.4  6.1 13.3 17.0 21.7 24.8 23.3 17.7 11.8  7.1  3.0 12.6 1200
2007  0.0 -0.3  8.4 10.4 17.3 21.6 23.6 24.2 20.2 15.0  7.5  2.4 12.5 1164
2008  0.3  2.1  6.2 11.5 16.2 21.6 23.5 22.6 19.3 12.7  6.8  1.7 12.0  136
AA   -0.7  0.9  5.3 10.8 15.9 20.4 23.0 22.2 18.4 12.5  5.9  0.8 11.3
Ad   -0.7  0.9  5.3 10.9 16.0 20.5 23.1 22.3 18.5 12.6  6.0  0.9 11.4
 
For Country Code 425
[chiefio@tubularbells Temps]$ 

And this is the “After”. Run on the converted USHCN.v2 data:

Thermometer Records, Average of Monthly Data and Yearly Average
by Year Across Month, with a count of thermometer records in that year
--------------------------------------------------------------------------
YEAR  JAN  FEB  MAR  APR  MAY  JUN JULY  AUG SEPT  OCT  NOV  DEC  YR COUNT
--------------------------------------------------------------------------
2002  2.1  2.8  4.8 12.4 15.5 21.8 24.5 23.1 19.9 11.7  6.0  2.1 12.2 1421
2003 -0.1  0.5  6.6 11.6 16.4 20.3 24.0 23.8 18.6 13.6  6.7  2.0 12.0 1412
2004 -1.3  1.0  8.2 11.9 17.4 20.5 22.9 21.5 19.3 13.5  7.4  1.8 12.0 1381
2005  0.4  3.2  5.7 11.9 15.7 21.4 24.2 23.4 20.2 13.4  7.6  0.3 12.3 1220
2006  4.2  1.5  6.2 13.4 17.1 21.7 24.8 23.3 17.8 11.8  7.2  3.1 12.7 1205
2007  0.1 -0.2  8.6 10.5 17.3 21.3 23.6 24.0 19.6 14.4  6.7  1.0 12.2 1166
2008 -0.6  1.3  5.5 10.8 15.6 21.2 23.4 22.3 18.7 12.2  6.5  0.2 11.4 1170
AA   -0.7  0.9  5.3 10.8 15.9 20.4 23.0 22.2 18.4 12.5  5.9  0.8 11.3
Ad   -0.7  1.0  5.4 10.9 16.0 20.5 23.1 22.3 18.5 12.6  6.0  1.0 11.4
 
For Country Code 425

Since the “cut off” of USHCN only happens mid year of 2007, the full impact does not show up until the 2008 number, but we see hints of it in the 2007 numbers were we have a 0.3 warming bias in the “Hansen Way” for the totals. We can also see that while the Jan Feb Mar numbers are almost identical, the Oct Nov Dec numbers have a significant warming bias of 0.6 C, 0.8 C, and 1.4 C. That mid-year cutoff thing showing through…

The bottom lines are two ways of doing “averages of the above averages” to show how much impact a programmer decision can have on “averaging”. The one with AA is the average of the monthly averages printed in the chart above. The one with Ad is an average of the daily values for the total history of that month, without going through the monthly average first. You can see that the 1/10 C place wanders back and forth depending on which way you chose to do that particular average. This is part of why I say that the “1/10 C place” is not something on which to bet the fate of the planet, or the economy…

Now, with this benchmark, we may need to move that to more than a single 1/10 C that is in doubt…

Comparison of Before and After USHCN.v2 - Version 2

Comparison of Before and After USHCN.v2 - Version 2

With thanks to ‘Ripper’ who supplied the graph in comments.

That’s the “meat of it”. Eventually I’m going to put a “STEP1″ and “STEP2″ benchmark A/B together. But that will have to wait until after morning coffee and maybe a spot of sausage and eggs. I love the smell of sausage being cooked in the morning ;-)

The Original USHCN Old Format Mid-2007 cut off Temperature History

[chiefio@tubularbells Temps]$ cat Nov2U.425.yrs.GAT 

Thermometer Records, Average of Monthly Data and Yearly Average
by Year Across Month, with a count of thermometer records in that year
--------------------------------------------------------------------------
YEAR  JAN  FEB  MAR  APR  MAY  JUN JULY  AUG SEPT  OCT  NOV  DEC  YR COUNT
--------------------------------------------------------------------------
1880  5.1  3.3  5.4 11.7 18.5 21.7 23.4 22.7 18.7 12.6  3.4  0.1 12.2  135
1881 -1.8  1.4  5.3 10.9 18.4 20.9 23.9 23.6 20.5 13.8  6.5  4.8 12.3  148
1882  0.9  4.0  6.5 11.1 14.8 21.0 22.7 22.8 19.1 14.4  6.2  1.2 12.1  179
1883 -2.1  0.5  4.5 11.1 15.4 21.7 23.5 22.0 18.4 12.6  7.0  2.3 11.4  197
1884 -2.0  1.4  5.1 10.4 16.5 21.1 22.8 22.3 20.1 14.7  6.7  0.7 11.7  227
1885 -2.1 -1.9  3.4 11.0 16.4 21.0 24.1 22.4 18.8 12.4  7.0  2.2 11.2  257
1886 -3.1  0.9  4.5 11.8 17.7 21.0 23.8 23.3 19.7 13.7  5.5 -0.2 11.6  272
1887 -1.3  1.2  5.2 11.1 18.8 21.8 24.6 22.3 18.9 12.3  6.3  0.6 11.8  314
1888 -3.5  1.2  3.0 12.3 16.0 21.7 24.0 22.8 18.5 12.0  6.9  2.9 11.5  367
1889  1.0 -0.4  7.0 12.3 16.9 20.7 23.4 22.4 18.5 11.9  6.3  6.0 12.2  437
1890  1.2  2.8  3.8 11.7 16.3 22.3 23.9 21.9 18.3 12.7  7.8  1.9 12.0  463
1891  1.0  1.0  3.0 11.9 15.8 21.1 21.9 22.3 20.2 12.5  5.6  3.6 11.7  523
1892 -1.9  2.4  3.9 10.4 15.4 21.4 23.1 22.7 19.0 13.2  5.7 -0.2 11.3  595
1893 -3.7 -0.7  4.0 10.6 15.6 21.5 23.8 22.2 18.8 12.7  5.4  1.4 11.0  661
1894  0.1 -0.9  7.2 11.9 16.8 21.5 23.7 22.8 19.6 13.3  5.6  2.1 12.0  696
1895 -2.4 -2.9  4.7 12.2 16.7 21.4 22.4 22.8 20.2 10.9  5.5  1.3 11.1  751
1896  0.1  1.9  3.3 12.7 18.6 21.4 23.6 23.1 17.9 11.7  5.4  2.4 11.8  785
1897 -1.6  1.4  4.9 11.2 16.2 20.7 23.9 22.0 20.2 14.2  5.9  0.2 11.6  818
1898  0.5  1.6  6.5 10.4 16.3 21.6 23.5 23.0 19.7 11.9  4.6 -0.6 11.6  842
1899 -0.7 -3.5  3.2 11.1 16.5 21.2 23.0 22.8 18.3 13.6  8.0  0.4 11.2  871
1900  1.0 -1.2  4.1 11.4 16.9 21.2 23.1 23.7 19.5 14.9  6.0  1.7 11.9  905
1901  0.1 -1.6  5.0 10.0 16.2 21.1 25.1 23.1 18.2 13.5  5.5  0.0 11.3  928
1902 -0.6 -0.8  6.3 10.8 17.3 20.2 22.8 21.9 17.5 13.2  8.0 -0.3 11.4  946
1903 -0.4 -0.8  6.8 10.5 16.2 18.9 22.5 21.7 17.8 12.9  4.9 -0.7 10.9  986
1904 -2.6 -1.2  5.3  9.4 16.1 20.1 22.0 21.5 18.7 12.7  6.6  0.3 10.7 1025
1905 -2.9 -3.0  7.6 10.8 15.9 20.6 22.4 22.5 19.2 11.7  6.3  0.7 11.0 1041
1906  1.4  0.8  2.4 12.0 15.9 20.3 22.5 22.7 19.7 12.1  5.6  1.7 11.4 1074
1907 -0.2  0.8  7.7  8.2 13.6 19.1 22.8 21.8 18.3 12.0  5.6  2.0 11.0 1101
1908  0.4  0.3  6.7 11.6 15.6 19.9 23.0 21.7 19.1 11.8  6.6  1.3 11.5 1128
1909  0.0  1.8  4.6  9.6 14.8 20.7 22.3 22.7 18.1 11.6  7.8 -2.8 10.9 1158
1910 -1.0 -1.4  9.7 11.9 14.9 19.9 23.1 21.6 18.7 13.5  5.0 -0.2 11.3 1171
1911  0.3  1.1  6.2 10.0 16.8 21.6 22.7 21.7 19.0 11.9  3.5  1.1 11.3 1206
1912 -4.4 -0.8  2.3 10.8 16.3 19.3 22.4 21.2 17.6 12.4  6.2  1.4 10.4 1217
1913  0.0 -1.1  4.2 11.0 15.7 20.5 22.9 23.0 17.8 11.4  7.7  1.6 11.2 1237
1914  1.4 -1.5  4.7 10.6 16.5 21.0 23.2 22.1 18.0 13.4  6.7 -2.2 11.2 1249
1915 -1.5  2.3  3.0 12.9 14.7 19.1 21.7 20.9 18.4 13.2  6.7  0.7 11.0 1262
1916 -1.5  0.0  4.9 10.1 15.4 18.9 23.5 22.3 17.6 11.7  5.2 -1.2 10.6 1283
1917 -1.6 -1.3  4.0  9.5 12.9 19.3 23.1 21.4 17.6 10.0  6.2 -2.0  9.9 1298
1918 -4.8  0.6  7.3  9.4 16.4 21.2 22.1 22.6 16.4 13.5  5.2  2.0 11.0 1312
1919  0.3  0.3  5.1 10.5 15.3 20.7 23.3 21.9 18.7 12.0  4.5 -1.8 10.9 1320
1920 -2.0  0.4  4.6  8.1 14.9 19.7 22.2 21.3 18.5 13.0  4.7  0.9 10.5 1328
1921  1.1  2.4  8.1 10.9 15.6 21.3 23.6 21.9 19.3 12.7  5.8  1.6 12.0 1336
1922 -2.5  0.0  4.8 10.3 16.3 21.1 22.3 22.2 19.4 13.0  5.9  0.7 11.1 1339
1923  1.0 -1.5  3.4  9.9 15.0 20.2 22.9 21.6 18.3 11.0  6.2  2.8 10.9 1346
1924 -2.9  1.1  3.1  9.9 14.0 20.0 21.7 22.0 16.7 12.9  5.9 -2.2 10.2 1346
1925 -1.8  3.0  6.5 12.3 15.0 21.0 22.9 21.8 19.5  9.3  4.9  0.2 11.2 1353
1926 -0.6  2.5  4.0  9.7 16.1 19.8 22.8 22.3 17.8 12.6  4.9 -0.3 11.0 1356
1927 -0.4  3.0  5.7 10.5 15.1 19.2 22.3 20.3 18.3 13.3  6.8 -1.5 11.1 1361
1928  0.0  1.2  5.6  8.8 16.0 18.6 22.6 22.0 17.0 12.8  5.7  1.0 10.9 1369
1929 -3.0 -2.6  6.3 10.6 14.8 19.6 22.7 22.0 17.5 12.0  4.1  0.9 10.4 1372
1930 -4.0  3.8  4.4 11.8 15.2 20.1 23.6 22.5 18.8 11.1  5.5  0.0 11.1 1377
1931  0.7  2.9  3.9 10.5 15.1 21.3 23.9 22.0 20.0 13.6  7.1  2.4 11.9 1384
1932  0.3  2.2  2.8 10.6 15.7 20.6 23.0 22.3 18.0 11.6  4.8 -0.6 10.9 1390
1933  1.6 -0.9  5.0 10.0 15.6 21.8 23.4 21.8 19.6 12.3  5.6  1.8 11.5 1395
1934  1.4  0.2  5.2 11.6 17.8 21.6 24.3 22.6 17.8 13.4  7.5  0.4 12.0 1393
1935 -0.5  1.9  6.5  9.7 14.0 19.6 23.8 22.5 18.0 12.1  4.7 -0.8 11.0 1394
1936 -2.7 -4.1  6.1  9.7 17.3 21.1 24.6 23.5 19.1 12.2  4.6  1.6 11.1 1400
1937 -2.4 -0.2  3.6  9.8 16.3 20.4 23.2 23.4 18.3 12.0  5.1  0.3 10.8 1403
1938 -0.2  2.0  7.2 11.0 15.4 20.2 22.9 23.1 18.9 13.7  5.2  1.3 11.7 1404
1939  1.0 -0.6  5.4 10.4 16.8 20.4 23.3 22.3 19.5 12.7  5.7  3.0 11.7 1405
1940 -4.6  0.8  4.7 10.0 15.7 20.6 23.1 21.9 18.3 13.3  4.4  2.4 10.9 1404
1941  0.0  0.5  3.8 11.6 16.8 20.1 23.1 22.1 18.3 13.0  6.2  2.4 11.5 1412
1942 -0.8 -0.5  5.2 11.7 15.3 19.9 22.9 21.8 17.6 12.5  5.8 -0.3 10.9 1419
1943 -1.9  1.7  3.3 10.6 15.2 20.7 23.0 22.6 17.4 11.9  5.0  0.3 10.8 1416
1944  0.2  1.2  3.3  9.1 16.7 20.2 22.2 21.9 18.1 12.6  5.4 -0.8 10.8 1424
1945 -1.2  1.4  7.6 10.2 14.1 18.7 22.2 21.8 18.2 12.0  5.4 -1.5 10.7 1471
1946 -0.1  1.2  8.0 11.8 14.6 19.9 22.6 21.0 17.8 12.2  5.7  1.8 11.4 1476
1947 -0.2 -0.7  3.4 10.3 15.1 19.1 22.0 23.1 18.7 14.7  3.9  0.7 10.8 1498
1948 -2.4 -0.3  3.9 11.6 15.3 20.1 22.5 21.7 18.5 11.5  5.9  0.5 10.7 1616
1949 -1.7  0.4  5.2 10.7 16.3 20.6 23.1 22.0 17.4 12.9  7.4  1.2 11.3 1747
1950  0.1  1.4  3.9  9.1 15.3 19.9 21.5 21.0 17.5 14.0  5.0  0.4 10.8 1757
1951 -0.8  1.3  3.6  9.9 16.0 19.3 22.7 21.8 17.8 12.4  4.0  0.2 10.7 1786
1952  0.2  2.1  3.7 10.8 15.5 21.4 23.2 22.2 18.6 11.6  5.5  1.3 11.3 1800
1953  2.0  2.4  6.2  9.6 15.6 21.2 23.1 22.1 18.8 13.5  6.9  1.7 11.9 1815
1954 -0.5  4.4  4.3 12.1 14.6 20.6 23.6 22.2 19.1 12.9  7.0  1.1 11.8 1825
1955 -0.7  0.0  4.8 11.6 16.3 19.0 23.4 23.0 18.7 12.7  3.9 -0.1 11.0 1752
1956 -0.5  0.5  4.6  9.7 16.2 20.9 22.4 21.9 18.1 13.3  5.1  2.5 11.2 1754
1957 -2.0  3.2  5.4 10.9 15.6 20.6 23.1 21.9 18.1 11.3  5.7  2.8 11.4 1763
1958  0.0 -0.1  3.8 10.6 16.5 19.8 22.4 22.5 18.5 12.6  6.6  0.0 11.1 1767
1959 -1.3  0.8  4.9 10.9 16.3 20.9 22.7 22.8 18.4 12.1  4.4  2.4 11.3 1766
1960 -0.6  0.0  1.8 11.4 15.3 20.3 22.7 22.1 18.9 12.7  6.3 -0.2 10.9 1762
1961 -0.8  2.8  6.1  9.2 14.8 20.4 22.5 22.2 18.0 12.3  5.4 -0.2 11.1 1760
1962 -2.0  1.6  3.6 10.8 16.9 19.9 21.9 21.9 17.5 13.4  6.3  0.7 11.0 1800
1963 -3.2  0.6  6.5 11.1 15.9 20.4 22.7 21.8 18.8 15.1  6.8 -1.7 11.2 1849
1964  0.3  0.2  4.1 10.8 16.3 20.1 23.2 21.2 17.8 12.1  6.5  0.3 11.1 1841
1965 -0.2  0.2  2.9 11.0 16.3 19.4 22.2 21.6 17.2 12.5  7.2  2.4 11.1 1835
1966 -2.6  0.1  5.9 10.0 15.2 20.1 23.5 21.4 17.9 11.7  6.6  0.8 10.9 1830
1967  0.9  0.3  6.3 10.9 14.3 20.1 22.1 21.5 17.7 12.3  5.6  1.1 11.1 1823
1968 -1.4  0.2  6.6 10.8 14.7 20.3 22.6 21.8 18.0 12.7  5.7 -0.5 11.0 1821
1969 -1.4  0.7  2.9 11.6 16.3 19.7 23.0 22.4 18.6 11.4  5.7  1.2 11.0 1813
1970 -2.7  1.9  4.2 10.3 16.4 20.4 23.0 22.6 18.5 12.0  5.9  1.2 11.1 1797
1971 -1.9  0.7  3.9 10.0 14.6 20.9 22.0 21.9 18.4 13.5  5.6  1.9 11.0 1693
1972 -0.9  0.4  5.7 10.0 15.9 19.7 22.2 21.9 18.1 11.3  4.5 -0.2 10.7 1689
1973 -1.0  0.9  7.1  9.9 15.0 20.6 22.7 22.3 18.2 13.4  6.3  1.1 11.4 1685
1974 -0.2  1.1  6.5 11.1 15.6 19.8 23.0 21.3 17.0 12.0  6.1  1.2 11.2 1679
1975  0.2  0.5  3.8  8.5 16.3 19.8 22.8 22.0 17.1 12.8  6.3  1.1 10.9 1670
1976 -1.2  3.7  6.0 11.1 14.9 20.0 22.4 21.4 17.8 10.3  4.0 -0.5 10.8 1669
1977 -4.4  2.1  6.5 12.2 16.8 20.8 23.4 22.0 18.8 12.1  6.2  0.5 11.4 1660
1978 -2.9 -2.2  4.8 10.9 15.6 20.4 22.8 22.1 19.0 12.3  5.9 -0.1 10.7 1660
1979 -4.5 -2.7  5.9 10.3 15.5 19.9 22.6 21.7 18.8 12.9  5.6  2.2 10.7 1657
1980 -0.4  0.2  4.3 10.9 16.0 20.1 23.9 22.6 19.0 11.5  6.0  1.0 11.3 1650
1981  0.0  2.5  6.1 12.8 15.2 21.0 23.0 21.9 18.0 11.3  7.0  0.7 11.6 1623
1982 -3.4  0.2  5.5  9.3 16.4 19.1 22.8 21.8 17.9 12.1  5.8  2.8 10.9 1605
1983  0.4  2.5  6.0  8.9 14.7 19.8 23.4 23.7 18.6 12.8  6.6 -2.9 11.2 1594
1984 -1.6  3.0  4.5 10.1 15.5 20.7 22.6 22.8 17.5 13.0  5.7  2.4 11.3 1592
1985 -2.6 -0.3  6.6 12.3 16.8 19.9 23.1 21.6 17.7 12.8  5.4 -1.2 11.0 1594
1986  1.2  2.0  7.5 11.7 16.5 21.2 23.2 21.7 18.3 12.6  5.7  2.0 12.0 1590
1987  0.0  2.8  6.3 11.9 17.5 21.4 23.2 22.2 18.6 11.5  7.2  2.0 12.0 1589
1988 -1.9  0.9  6.0 11.2 16.6 21.4 23.8 23.3 18.3 11.6  6.8  1.3 11.6 1598
1989  1.5 -0.9  5.9 11.6 16.0 20.4 23.3 22.1 18.1 12.8  6.2 -2.1 11.2 1597
1990  3.0  2.8  7.4 11.7 15.5 21.4 23.1 22.7 19.6 12.8  7.7  0.7 12.4 1572
1991 -0.6  4.3  7.1 12.4 17.8 21.4 23.7 23.1 18.8 13.3  5.1  2.8 12.4 1549
1992  1.5  4.4  7.1 11.7 16.5 20.1 22.4 21.3 18.5 12.7  5.7  1.0 11.9 1536
1993  0.1  0.0  5.6 10.7 16.8 20.5 23.3 22.9 18.0 12.3  5.3  1.9 11.4 1529
1994 -1.5  0.3  7.1 12.3 16.4 22.1 23.4 22.4 19.0 13.0  7.2  3.0 12.1 1519
1995  1.0  2.6  6.9 10.6 15.8 20.7 23.7 24.0 18.6 13.3  5.7  1.1 12.0 1495
1996 -0.9  1.9  4.2 10.9 16.6 21.4 23.1 22.6 18.1 12.7  4.6  1.8 11.4 1464
1997 -0.6  3.0  7.3  9.7 15.5 20.7 23.3 22.3 19.4 12.8  5.5  1.7 11.7 1431
1998  2.1  4.4  5.8 11.4 17.9 20.8 24.2 23.5 20.9 13.5  7.7  2.8 12.9 1428
1999  0.8  4.1  5.9 11.7 16.4 20.8 24.1 23.0 18.4 12.8  9.2  2.6 12.5 1447
2000  0.5  4.3  8.2 11.5 17.7 21.0 23.2 23.3 18.9 13.4  4.2 -2.1 12.0 1429
2001 -0.2  1.3  5.1 12.3 17.4 21.0 23.6 23.6 18.7 12.7  9.2  3.1 12.3 1434
2002  2.1  2.8  4.8 12.4 15.6 21.8 24.5 23.1 20.0 11.7  6.0  2.1 12.2 1421
2003 -0.1  0.5  6.6 11.6 16.4 20.4 24.0 23.8 18.5 13.5  6.7  1.9 12.0 1411
2004 -1.4  0.9  8.2 11.8 17.4 20.5 22.9 21.5 19.3 13.4  7.3  1.7 12.0 1381
2005  0.3  3.2  5.6 11.8 15.6 21.4 24.2 23.4 20.2 13.4  7.5  0.2 12.2 1213
2006  4.1  1.4  6.1 13.3 17.0 21.7 24.8 23.3 17.7 11.8  7.1  3.0 12.6 1200
2007  0.0 -0.3  8.4 10.4 17.3 21.6 23.6 24.2 20.2 15.0  7.5  2.4 12.5 1164
2008  0.3  2.1  6.2 11.5 16.2 21.6 23.5 22.6 19.3 12.7  6.8  1.7 12.0  136
AA   -0.7  0.9  5.3 10.8 15.9 20.4 23.0 22.2 18.4 12.5  5.9  0.8 11.3
Ad   -0.7  0.9  5.3 10.9 16.0 20.5 23.1 22.3 18.5 12.6  6.0  0.9 11.4
 
For Country Code 425
[chiefio@tubularbells Temps]$ 

This is the same chart we have seen before, my standard benchmark of archived USHCN and GHCN input. Also notice that this is for Country Code 425. The U.S.A.

The New USHCN.v2 data file Temperature History

I may have translated some of the “Estimated Value” flags a bit more sternly than warranted. If someone familiar with them can look at the “How to fix USHCN” link and comment there on the choices I made, I can rerun with better choices. For this run, I just said “all estimates are made up values” and those get tossed. I think you see that in the early part of this series where the thermometer counts are lower due to more old data being estimated. It is also possible that the input USHCN.v2 file is just more paranoid about marking estimated values. In either case, it has little to no impact on the benchmark and has none at all on the merit of the 2007 and 2008 comparisons.

Look at ./Temps/Temps.425.yrs.GAT (Y/N)? y
 
Thermometer Records, Average of Monthly Data and Yearly Average
by Year Across Month, with a count of thermometer records in that year
--------------------------------------------------------------------------
YEAR  JAN  FEB  MAR  APR  MAY  JUN JULY  AUG SEPT  OCT  NOV  DEC  YR COUNT
--------------------------------------------------------------------------
1880  5.6  3.7  6.0 12.1 18.8 21.9 23.5 22.9 18.7 12.9  3.5  0.5 12.5   71
1881 -1.4  1.9  5.7 11.1 18.7 21.2 23.9 23.8 20.3 13.9  6.7  5.2 12.6   78
1882  1.6  4.9  7.1 11.5 15.0 20.7 22.3 22.6 19.1 14.5  6.8  1.9 12.3   81
1883 -1.5  1.1  5.4 11.5 15.3 21.2 23.1 21.8 18.4 12.7  7.4  3.1 11.6   83
1884 -1.1  1.9  5.9 10.9 16.5 20.9 22.4 22.0 19.5 14.4  7.4  1.0 11.8   86
1885 -1.2 -0.5  4.9 11.8 16.2 20.5 23.5 22.0 18.8 12.6  7.7  3.1 11.6   86
1886 -2.4  2.1  5.3 12.0 17.4 20.7 23.7 23.1 19.2 13.8  5.8  0.5 11.8   93
1887 -1.0  1.2  6.1 11.7 18.3 21.3 24.1 21.9 18.7 12.4  6.9  1.3 11.9  104
1888 -2.9  2.1  3.7 12.6 15.5 21.1 23.6 22.4 18.6 12.3  7.3  3.5 11.7  108
1889  1.3  0.1  7.8 12.6 16.5 20.3 23.1 22.2 18.4 12.4  6.4  6.1 12.3  119
1890  0.9  2.8  4.3 11.9 16.0 21.8 23.7 21.7 18.2 12.9  8.3  3.0 12.1  126
1891  2.0  1.5  3.8 12.1 15.6 20.4 21.8 22.3 19.9 13.0  6.2  4.1 11.9  134
1892 -0.9  3.2  5.0 10.6 15.3 20.9 22.9 22.6 19.1 13.5  6.4  0.4 11.6  145
1893 -2.4 -0.2  4.3 10.3 15.3 20.9 23.4 22.0 18.5 12.7  5.7  2.0 11.0  154
1894  0.3 -0.4  7.3 12.0 16.4 20.9 23.4 22.7 19.3 13.4  6.2  2.8 12.0  155
1895 -2.5 -2.9  4.6 12.2 16.5 21.0 22.2 22.5 19.8 10.8  5.3  1.0 10.9  745
1896 -0.1  1.7  3.2 12.4 18.2 21.1 23.5 22.8 17.6 11.5  5.0  2.3 11.6  792
1897 -1.7  1.3  4.8 11.2 16.3 20.6 23.7 22.1 20.0 14.0  5.7  0.1 11.5  848
1898  0.4  1.5  6.3 10.4 16.3 21.5 23.4 23.0 19.5 11.8  4.5 -0.8 11.5  877
1899 -0.9 -3.6  3.1 11.1 16.5 21.2 23.1 22.8 18.3 13.5  8.0  0.4 11.1  903
1900  1.0 -1.2  4.3 11.5 17.0 21.3 23.2 23.8 19.5 14.8  6.0  1.7 11.9  927
1901  0.0 -1.7  5.0 10.1 16.3 21.2 25.1 23.1 18.2 13.5  5.5  0.0 11.4  947
1902 -0.7 -0.9  6.3 10.9 17.4 20.3 22.8 22.0 17.6 13.3  8.0 -0.2 11.4  971
1903 -0.4 -0.7  6.9 10.7 16.3 19.0 22.6 21.7 17.8 12.9  5.0 -0.7 10.9 1012
1904 -2.6 -1.1  5.4  9.6 16.2 20.1 22.1 21.6 18.8 12.7  6.7  0.3 10.8 1050
1905 -2.8 -2.9  7.6 10.9 16.0 20.6 22.5 22.6 19.2 11.8  6.3  0.7 11.0 1069
1906  1.3  0.8  2.4 12.1 16.0 20.3 22.5 22.7 19.7 12.1  5.6  1.7 11.4 1096
1907 -0.2  0.8  7.7  8.4 13.7 19.2 22.9 21.8 18.4 12.1  5.6  2.0 11.0 1132
1908  0.4  0.3  6.8 11.7 15.7 20.0 23.1 21.7 19.2 11.8  6.6  1.4 11.6 1149
1909  0.0  1.8  4.7  9.8 14.9 20.7 22.4 22.8 18.2 11.7  7.9 -2.8 11.0 1178
1910 -1.0 -1.4  9.8 12.1 15.0 20.0 23.2 21.7 18.8 13.6  5.1 -0.2 11.4 1185
1911  0.4  1.2  6.3 10.1 17.0 21.7 22.8 21.8 19.2 12.0  3.6  1.2 11.4 1215
1912 -4.4 -0.8  2.3 10.9 16.4 19.4 22.5 21.2 17.8 12.4  6.3  1.4 10.5 1221
1913  0.0 -1.1  4.3 11.1 15.9 20.5 23.0 23.1 17.9 11.5  7.8  1.7 11.3 1246
1914  1.5 -1.5  4.8 10.7 16.5 21.1 23.3 22.1 18.1 13.5  6.7 -2.2 11.2 1260
1915 -1.5  2.3  3.0 13.0 14.9 19.2 21.8 21.0 18.5 13.3  6.8  0.8 11.1 1272
1916 -1.4  0.0  4.9 10.2 15.6 18.9 23.6 22.3 17.7 11.7  5.3 -1.2 10.6 1292
1917 -1.5 -1.2  4.1  9.6 13.1 19.4 23.2 21.5 17.7 10.0  6.2 -2.0 10.0 1309
1918 -4.8  0.7  7.4  9.5 16.5 21.3 22.2 22.7 16.4 13.6  5.3  2.1 11.1 1324
1919  0.3  0.3  5.2 10.6 15.4 20.8 23.4 22.0 18.8 12.2  4.6 -1.7 11.0 1331
1920 -1.9  0.5  4.7  8.3 15.0 19.8 22.2 21.4 18.6 13.1  4.7  1.0 10.6 1341
1921  1.2  2.4  8.2 11.0 15.7 21.4 23.7 22.0 19.5 12.8  5.9  1.6 12.1 1349
1922 -2.4  0.1  4.9 10.5 16.4 21.1 22.4 22.3 19.5 13.1  6.0  0.8 11.2 1351
1923  1.1 -1.3  3.6 10.0 15.2 20.3 23.0 21.7 18.4 11.1  6.3  2.8 11.0 1356
1924 -2.8  1.2  3.2 10.1 14.2 20.1 21.8 22.1 16.8 12.9  6.1 -2.0 10.3 1358
1925 -1.7  3.0  6.6 12.5 15.2 21.1 23.0 21.8 19.6  9.4  5.0  0.3 11.3 1366
1926 -0.5  2.6  4.0  9.8 16.2 19.8 22.8 22.4 17.9 12.7  4.9 -0.2 11.0 1367
1927 -0.3  3.1  5.7 10.6 15.3 19.3 22.3 20.4 18.5 13.4  6.8 -1.4 11.1 1372
1928  0.0  1.2  5.7  8.9 16.1 18.7 22.7 22.1 17.1 12.9  5.8  1.0 11.0 1383
1929 -2.9 -2.5  6.4 10.8 14.9 19.6 22.7 22.0 17.6 12.1  4.1  0.9 10.5 1386
1930 -4.0  3.8  4.5 11.9 15.4 20.2 23.6 22.5 18.8 11.1  5.5  0.1 11.1 1390
1931  0.7  3.0  4.0 10.6 15.2 21.3 24.0 22.0 20.1 13.6  7.1  2.4 12.0 1398
1932  0.4  2.3  2.9 10.7 15.8 20.6 23.1 22.3 18.1 11.7  4.8 -0.6 11.0 1403
1933  1.6 -0.9  5.0 10.1 15.7 21.9 23.4 21.9 19.6 12.4  5.7  1.8 11.5 1406
1934  1.4  0.2  5.3 11.7 17.9 21.6 24.4 22.7 17.8 13.4  7.5  0.5 12.0 1404
1935 -0.4  1.9  6.6  9.8 14.1 19.6 23.8 22.5 18.1 12.1  4.7 -0.8 11.0 1405
1936 -2.6 -4.1  6.2  9.8 17.4 21.2 24.6 23.6 19.1 12.2  4.7  1.6 11.1 1411
1937 -2.4 -0.1  3.6  9.9 16.4 20.4 23.3 23.4 18.4 12.0  5.1  0.3 10.9 1413
1938 -0.2  2.1  7.3 11.1 15.5 20.2 23.0 23.1 19.0 13.7  5.3  1.3 11.8 1414
1939  1.0 -0.5  5.5 10.5 16.9 20.5 23.3 22.3 19.6 12.7  5.7  3.0 11.7 1415
1940 -4.6  0.8  4.7 10.1 15.7 20.7 23.1 21.9 18.3 13.3  4.4  2.4 10.9 1414
1941  0.0  0.5  3.8 11.7 16.9 20.1 23.1 22.1 18.3 13.1  6.2  2.4 11.5 1422
1942 -0.9 -0.5  5.2 11.7 15.3 20.0 23.0 21.8 17.6 12.5  5.9 -0.3 10.9 1428
1943 -1.8  1.8  3.3 10.7 15.3 20.7 23.0 22.6 17.4 11.9  5.0  0.3 10.8 1428
1944  0.2  1.2  3.3  9.1 16.7 20.3 22.2 21.9 18.2 12.6  5.5 -0.8 10.9 1435
1945 -1.2  1.4  7.7 10.3 14.2 18.7 22.2 21.8 18.2 12.0  5.4 -1.6 10.8 1481
1946 -0.1  1.2  8.0 11.8 14.7 19.9 22.6 21.0 17.9 12.2  5.8  1.8 11.4 1486
1947 -0.2 -0.8  3.4 10.3 15.2 19.1 22.1 23.1 18.8 14.7  3.9  0.6 10.8 1507
1948 -2.4 -0.4  3.9 11.6 15.4 20.2 22.5 21.7 18.5 11.5  5.9  0.5 10.7 1623
1949 -1.7  0.4  5.2 10.7 16.3 20.6 23.1 22.0 17.4 12.9  7.4  1.2 11.3 1755
1950  0.1  1.4  3.9  9.1 15.3 19.9 21.5 21.0 17.5 14.0  5.0  0.4 10.8 1762
1951 -0.8  1.2  3.7 10.0 16.0 19.3 22.7 21.9 17.8 12.5  4.0  0.2 10.7 1789
1952  0.2  2.1  3.7 10.8 15.6 21.4 23.2 22.2 18.6 11.6  5.5  1.3 11.3 1802
1953  2.0  2.4  6.2  9.6 15.7 21.2 23.1 22.1 18.8 13.5  6.9  1.6 11.9 1816
1954 -0.5  4.4  4.3 12.1 14.7 20.6 23.6 22.2 19.1 12.9  7.0  1.1 11.8 1826
1955 -0.7  0.0  4.8 11.6 16.3 19.0 23.4 23.0 18.8 12.7  3.8 -0.1 11.0 1752
1956 -0.5  0.4  4.6  9.7 16.2 20.9 22.4 21.9 18.0 13.3  5.1  2.5 11.2 1755
1957 -2.0  3.1  5.3 10.9 15.6 20.6 23.1 21.9 18.1 11.3  5.7  2.7 11.4 1764
1958  0.0 -0.1  3.8 10.7 16.6 19.8 22.4 22.5 18.5 12.5  6.6  0.0 11.1 1768
1959 -1.3  0.7  4.9 10.9 16.4 20.9 22.7 22.8 18.4 12.1  4.4  2.4 11.3 1766
1960 -0.6  0.0  1.8 11.4 15.3 20.3 22.6 22.0 18.8 12.7  6.3 -0.3 10.9 1762
1961 -0.9  2.7  6.1  9.2 14.8 20.4 22.5 22.2 18.0 12.3  5.4 -0.2 11.0 1760
1962 -2.0  1.6  3.6 10.8 16.9 19.9 21.9 21.9 17.5 13.3  6.2  0.6 11.0 1800
1963 -3.2  0.6  6.5 11.1 16.0 20.3 22.7 21.8 18.8 15.0  6.8 -1.7 11.2 1849
1964  0.2  0.2  4.1 10.8 16.3 20.1 23.2 21.2 17.8 12.0  6.4  0.3 11.0 1840
1965 -0.3  0.2  2.8 11.0 16.4 19.4 22.2 21.6 17.2 12.5  7.2  2.4 11.0 1834
1966 -2.7  0.1  5.9 10.0 15.2 20.1 23.5 21.4 17.9 11.7  6.6  0.7 10.9 1829
1967  0.9  0.2  6.2 10.9 14.3 20.0 22.1 21.5 17.7 12.2  5.5  1.1 11.1 1821
1968 -1.4  0.1  6.5 10.8 14.7 20.2 22.6 21.7 18.0 12.7  5.6 -0.6 10.9 1820
1969 -1.4  0.6  2.9 11.6 16.3 19.7 23.0 22.4 18.6 11.4  5.6  1.1 11.0 1811
1970 -2.8  1.8  4.1 10.3 16.4 20.4 23.0 22.6 18.4 11.9  5.8  1.2 11.1 1796
1971 -2.0  0.6  3.8 10.0 14.6 20.8 22.0 21.9 18.3 13.4  5.6  1.8 10.9 1692
1972 -1.0  0.3  5.6 10.0 15.9 19.7 22.2 21.9 18.0 11.3  4.5 -0.3 10.7 1688
1973 -1.0  0.8  7.1  9.9 15.0 20.6 22.6 22.3 18.2 13.4  6.3  1.1 11.4 1685
1974 -0.2  1.1  6.5 11.0 15.6 19.8 23.0 21.3 16.9 12.0  6.0  1.2 11.2 1677
1975  0.2  0.5  3.7  8.5 16.2 19.8 22.8 22.0 17.1 12.8  6.3  1.1 10.9 1669
1976 -1.3  3.6  6.0 11.1 14.9 20.0 22.4 21.4 17.8 10.3  3.9 -0.5 10.8 1668
1977 -4.5  2.1  6.5 12.2 16.8 20.8 23.4 22.0 18.8 12.1  6.1  0.4 11.4 1658
1978 -3.0 -2.2  4.8 10.9 15.6 20.4 22.8 22.1 19.0 12.3  5.9 -0.1 10.7 1659
1979 -4.5 -2.8  5.8 10.3 15.4 19.9 22.6 21.7 18.8 12.8  5.6  2.2 10.6 1656
1980 -0.4  0.2  4.3 10.9 16.0 20.1 23.9 22.6 18.9 11.5  6.0  1.0 11.2 1648
1981 -0.1  2.5  6.1 12.7 15.2 21.0 23.0 21.9 18.0 11.3  6.9  0.6 11.6 1622
1982 -3.5  0.2  5.5  9.2 16.4 19.1 22.8 21.8 17.8 12.0  5.7  2.8 10.8 1605
1983  0.3  2.5  5.9  8.9 14.7 19.8 23.3 23.7 18.6 12.7  6.5 -2.9 11.2 1594
1984 -1.6  2.9  4.5 10.1 15.5 20.7 22.6 22.7 17.5 13.0  5.6  2.4 11.3 1593
1985 -2.6 -0.4  6.6 12.3 16.8 19.9 23.1 21.6 17.7 12.8  5.3 -1.2 11.0 1595
1986  1.1  1.9  7.4 11.6 16.5 21.2 23.2 21.7 18.2 12.6  5.6  1.9 11.9 1590
1987  0.0  2.8  6.3 11.8 17.5 21.4 23.2 22.2 18.6 11.4  7.1  2.0 12.0 1589
1988 -1.9  0.8  5.9 11.1 16.6 21.4 23.8 23.3 18.3 11.6  6.7  1.3 11.6 1598
1989  1.4 -0.9  5.9 11.5 16.0 20.4 23.3 22.0 18.1 12.8  6.1 -2.2 11.2 1597
1990  2.9  2.7  7.3 11.7 15.5 21.4 23.1 22.6 19.6 12.7  7.6  0.6 12.3 1572
1991 -0.7  4.3  7.0 12.4 17.8 21.4 23.6 23.1 18.8 13.2  5.1  2.7 12.4 1549
1992  1.4  4.3  7.0 11.6 16.5 20.1 22.3 21.3 18.5 12.7  5.7  0.9 11.9 1536
1993  0.0  0.0  5.5 10.7 16.8 20.4 23.3 22.9 18.0 12.3  5.2  1.8 11.4 1529
1994 -1.6  0.2  7.0 12.2 16.3 22.0 23.4 22.4 18.9 13.0  7.1  3.0 12.0 1519
1995  0.9  2.6  6.9 10.6 15.7 20.6 23.7 24.0 18.6 13.3  5.6  1.0 12.0 1494
1996 -1.0  1.9  4.1 10.8 16.6 21.4 23.1 22.6 18.1 12.6  4.6  1.7 11.4 1464
1997 -0.6  2.9  7.2  9.6 15.4 20.7 23.2 22.2 19.4 12.7  5.5  1.7 11.7 1432
1998  2.1  4.4  5.7 11.4 17.9 20.8 24.2 23.5 20.9 13.4  7.7  2.7 12.9 1429
1999  0.8  4.1  5.9 11.6 16.4 20.8 24.1 23.0 18.4 12.8  9.2  2.6 12.5 1448
2000  0.5  4.3  8.2 11.6 17.7 21.0 23.2 23.3 18.9 13.4  4.2 -2.1 12.0 1431
2001 -0.2  1.3  5.1 12.3 17.4 21.0 23.6 23.6 18.7 12.7  9.2  3.1 12.3 1437
2002  2.1  2.8  4.8 12.4 15.5 21.8 24.5 23.1 19.9 11.7  6.0  2.1 12.2 1421
2003 -0.1  0.5  6.6 11.6 16.4 20.3 24.0 23.8 18.6 13.6  6.7  2.0 12.0 1412
2004 -1.3  1.0  8.2 11.9 17.4 20.5 22.9 21.5 19.3 13.5  7.4  1.8 12.0 1381
2005  0.4  3.2  5.7 11.9 15.7 21.4 24.2 23.4 20.2 13.4  7.6  0.3 12.3 1220
2006  4.2  1.5  6.2 13.4 17.1 21.7 24.8 23.3 17.8 11.8  7.2  3.1 12.7 1205
2007  0.1 -0.2  8.6 10.5 17.3 21.3 23.6 24.0 19.6 14.4  6.7  1.0 12.2 1166
2008 -0.6  1.3  5.5 10.8 15.6 21.2 23.4 22.3 18.7 12.2  6.5  0.2 11.4 1170
AA   -0.7  0.9  5.3 10.8 15.9 20.4 23.0 22.2 18.4 12.5  5.9  0.8 11.3
Ad   -0.7  1.0  5.4 10.9 16.0 20.5 23.1 22.3 18.5 12.6  6.0  1.0 11.4
 
For Country Code 425

I note here, again, that this report is for Country Code 425: The U.S.A.

Run Log of GIStemp STEP0 run to completion

[chiefio@tubularbells STEP0]$ do_comb_step0.sh v2.mean
Clear work_files directory? (Y/N) y
Bringing Antarctic tables closer to input_files/v2.mean format
collecting surface station data
... and autom. weather stn data
... and australian data
replacing '-' by -999.9, blanks are left alone at this stage
adding extra Antarctica station data to input_files/v2.mean
created v2.meanx from v2_antarct.dat and input_files/v2.mean
GHCN data:
 removing data before year 1880.
created v2.meany from v2.meanx
replacing USHCN station data in v2.mean by USHCN_noFIL data (Tobs+maxmin adj+SHAPadj+noFIL)
  reformat USHCN to v2.mean format
extracting FILIN data
getting inventory data for v2-IDs
 USHCN data end in  2009
finding offset caused by adjustments
extracting US data from GHCN set
 removing data before year 1980.
getting USHCN data:
-rw-rw-r--    1 chiefio  chiefio  10255476 Nov  7 10:21 USHCN.v2.mean_noFIL
-rw-rw-r--    1 chiefio  chiefio   9594277 Nov  7 10:21 xxx
doing dump_old.exe
 removing data before year 1880.
-rw-rw-r--    1 chiefio  chiefio   9594277 Nov  7 10:21 yyy
Sorting into USHCN.v2.mean_noFIL
-rw-rw-r--    1 chiefio  chiefio   9594277 Nov  7 10:21 USHCN.v2.mean_noFIL
 done with ushcn
created ushcn-ghcn_offset_noFIL 
Doing cmb2.ushcn.v2.exe
created  v2.meanz
replacing Hohenspeissenberg data in v2.mean by more complete data (priv.comm.)
disregard pre-1880 data:
At Cleanup
created v2.mean_comb
move this file from to_next_step/. to ../STEP1/to_next_step/. 
Copy the file to_next_step/v2.mean_comb to ../STEP1/to_next_step/v2.mean_comb? (Y/N) n
 
and execute in the STEP1 directory the command:
   do_comb_step1.sh v2.mean_comb
[chiefio@tubularbells STEP0]$ 

The New Stations Since USHCN Cut Off in 2007

[chiefio@tubularbells Uv2study]$ more New.Station.inv 
021514  33.2058 -111.6819  434.3 AZ CHANDLER HEIGHTS               025467 ------ ------ +7
026353  31.9356 -109.8378 1325.9 AZ PEARCE SUNSITES                022669 022659 ------ +7
035512  35.5125  -93.8683  253.0 AR OZARK 2                        035508 ------ ------ +6
091500  31.1903  -84.2036   53.3 GA CAMILLA 3SE                    093516 090979 ------ +5
100803  42.3353 -111.3850 1817.8 ID BERN                           480915 ------ ------ +7
101956  47.6789 -116.8017  650.1 ID COEUR D'ALENE                  100667 ------ ------ +8
105685  44.5664 -113.8953 1539.2 ID MAY 2SSE                       101663 ------ ------ +7
106305  43.6039 -116.5753  752.9 ID NAMPA SUGAR FACTORY            101380 ------ ------ +7
116738  39.8058  -90.8236  198.1 IL PERRY 6 NW                     113717 ------ ------ +6
141867  38.6758  -96.5097  402.3 KS COUNCIL GROVE LAKE             142602 ------ ------ +6
147542  39.7772  -98.7783  542.5 KS SMITH CTR                      146374 ------ ------ +6
150381  36.8825  -83.8819  301.8 KY BARBOURVILLE                   155389 ------ ------ +5
153762  37.7558  -87.6456  136.9 KY HENDERSON 8 SSW                156091 ------ ------ +6
170814  45.6603  -69.8120  323.1 ME BRASSUA DAM                    177174 ------ ------ +5
171628  44.9197  -69.2417   90.5 ME CORINNA                        176430 ------ ------ +5
180700  39.0303  -76.9314   44.2 MD BELTSVILLE                     181995 ------ ------ +5
185718  39.2811  -76.6100    6.1 MD MD SCI CTR BALTIMORE           180470 ------ ------ +5
196783  42.5242  -71.1264   27.4 MA READING                        191447 ------ ------ +5
198757  42.1608  -71.2458   50.3 MA WALPOLE 2                      192975 ------ ------ +5
199316  42.1333  -71.4333   64.0 MA WEST MEDWAY                    191561 ------ ------ +5
210252  48.3311  -96.8253  258.2 MN ARGYLE                         213455 ------ ------ +6
213303  47.2436  -93.4975  399.3 MN GRAND RPDS FOREST LAB          216612 ------ ------ +6
215175  47.6308  -93.6522  422.8 MN MARCELL 5NE                    219059 ------ ------ +6
244364  45.9353 -107.1375  944.9 MT HYSHAM 25 SSE                  242112 ------ ------ +7
247318  47.3033 -115.0908  810.8 MT SAINT REGIS 1 NE               243984 ------ ------ +7
248569  47.8800 -105.3686  696.2 MT VIDA 6 NE                      246660 ------ ------ +7
258133  41.4581 -100.5986  911.4 NE STAPLETON 5W                   253540 ------ ------ +6
270706  44.3061  -71.6575  359.7 NH BETHLEHEM 2                    270703 ------ ------ +5
288816  39.9500  -74.2167   30.5 NJ TOMS RIVER                     288899 ------ ------ +5
292608  36.9358 -107.0000 2070.5 NM DULCE                          052432 ------ ------ +7
300023  42.1014  -77.2344  304.5 NY ADDISON                        ------ ------ ------ +5
302060  42.0628  -75.4264  304.8 NY DEPOSIT                        300360 ------ ------ +5
302129  41.0072  -73.8344   61.0 NY DOBBS FERRY ARDSLEY            307497 ------ ------ +5
304996  44.8419  -74.3081  268.2 NY MALONE                         301387 ------ ------ +5
308737  43.1450  -75.3839  216.7 NY UTICA FAA AP                   308733 308739 ------ +5
318694  36.3919  -81.3039  876.3 NC TRANSOU                        310506 ------ ------ +5
321408  46.8769  -97.2328  285.0 ND CASSELTON AGRONOMY FM          325660 ------ ------ +6
332098  41.2778  -84.3853  213.4 OH DEFIANCE                       335664 335669 ------ +5
364896  40.3333  -76.4667  137.2 PA LEBANON 2 W                    363699 ------ ------ +5
367029  41.7394  -75.4464  548.6 PA PLEASANT MT 1 W                363056 ------ ------ +5
416794  33.6744  -95.5586  165.2 TX PARIS                          344384 ------ ------ +6
422726  41.0222 -111.9353 1335.0 UT FARMINGTON 3 NW                427318 ------ ------ +7
425477  38.4500 -112.2292 1801.4 UT MARYSVALE                      420519 ------ ------ +7
426135  39.7122 -111.8319 1563.0 UT NEPHI                          422418 ------ ------ +7
427559  38.9139 -111.4161 2304.3 UT SALINA 24 E                    425148 ------ ------ +7
427729  39.6847 -111.2047 2655.4 UT SCOFIELD-SKYLINE MINE          423896 ------ ------ +7
437607  44.6264  -73.3031   33.5 VT SOUTH HERO                     306659 ------ ------ +5
437612  44.0725  -72.9736  408.7 VT SOUTH LINCOLN                  435733 435740 ------ +5
451630  48.5472 -117.9019  474.6 WA COLVILLE                       451630 451650 ------ +8
451939  47.3706 -123.1600    6.4 WA CUSHMAN POWERHOUSE 2           453284 ------ ------ +8
453222  45.8081 -120.8428  499.9 WA GOLDENDALE                     453226 ------ ------ +8
455224  47.1358 -122.2558  176.5 WA MC MILLIN RSVR                 456803 ------ ------ +8
457267  47.0894 -117.5931  592.8 WA SAINT JOHN                     451586 ------ ------ +8
466989  38.6667  -80.2000  877.8 WV PICKENS 2 N                    466991 ------ ------ +5
467029  37.5744  -81.5356  390.1 WV PINEVILLE                      463353 ------ ------ +5
469610  37.6731  -82.2761  231.6 WV WILLIAMSON                     469605 ------ ------ +5
475808  44.5297  -90.6383  310.9 WI NEILLSVILLE 3 SW               473471 ------ ------ +6
480552  42.6339 -106.3775 1831.8 WY BATES CREEK #2                 487105 ------ ------ +7
481840  44.5219 -109.0633 1549.0 WY CODY                           481175 ------ ------ +7
[chiefio@tubularbells Uv2study]$ 

So this is a list of the stations in the USHCN.v2 inventory format for which I need to create GHCN v2.inv file entries that look like:

42572786007 COLVILLE 5NE                    48.58 -117.80  914  885R   -9MVxxno-9x-9COOL CONIFER    A1   0

Or as “ragged right”:

42572786007 COLVILLE 5NE 48.58 -117.80 914 885R -9MVxxno-9x-9COOL CONIFER A1 0

There are a few, like this one for COLVILLE, that already have an entry, but most of them do not.

Conclusion

I think there is pretty clear evidence for significant warming of the temperature record from this “Selection Bias” or perhaps “Survivor Bias” in the US data. It is not just a “California Thing”. There is a similar deletion process in the thermometers for other major countries of the world. At present, I do not have alternative data sources for those temperature series.

What is very clear, however, is that this deletion of thermometers from the present reporting base introduces significant errors into the 1/10 C place, and perhaps even up into the whole degrees of C. For this reason, the GIStemp product is no longer usable for statements about the temperature of the planet, the direction of any trends, and certainly not for any policy decisions. For those, you would be better served to look out the window…

About these ads

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific, Favorites, GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

33 Responses to GIStemp GHCN Selection Bias Measured 0.6 C

  1. wattsupwiththat says:

    Can you provide a key for your data tables so that each column is clearly defined? It is missing from this essay.

    - thanks

    Anthony

    REPLY: “Yeah, it needs to be there. I’d described it in the last dozen postings and was feeling a bit repetitive… but there will always be new folks who have not read the last dozen… -ems”

  2. wattsupwiththat says:

    also, check your email and reply please

  3. Ripper says:

    Amazing stuff E.M.

    Here is a [url=http://members.westnet.com.au/rippersc/gisstemp.jpg]graph[/url]

    REPLY: “Don’t know why, but for some reason WordPress tossed this version into the spam queue. Maybe it doesn’t like the ‘URL’ bit? Who knows… At any rate, thanks for the graph, I’m putting it into the report ‘soon’ -ems”

  4. Ripper says:

    Amazing stuff E.M.

    Here is a graph for you.

    http://members.westnet.com.au/rippersc/gisstemp.jpg

  5. pyromancer76 says:

    I love the smell of GIStemp GHCN goose(sausage) being cooked on E.M. Smith’s blog. Way to go, Chiefio!

    REPLY: “Now you know why WSW is a bit delayed this weekend… -ems”

  6. JP Miller says:

    If your analysis holds up, it is breathtaking and makes more understandable why Gavin Schmidt would have asked that any suggestion he is “connected to” GIStemp be publicly corrected. If he is aware of what appears to be willful distortion in this dataset, he should run the other way.

    Next question, how can this analysis be “peer reviewed” and vetted in a way that results in GIStemp being taken out of the science of climatology and Hansen being charged with fraud? There have to be consequences for government scientists being medacious on this scale. And don’t give me the “he was doing his best with limited budget” crap. Doesn’t wash.

    REPLY: ‘Well, I’ve put it up for the world to see. The Peers I care about most are able to look at it right now. Those peers are you’all. The ones who edit magazines for a living can catch up later ;-) IMHO this is nothing more than a return to how Science was done 100 years ago. You kicked it around with interested parties, sometimes published yourself, and later it might end up in some journal for the record. I see no real need for “gatekeepers on the truth”; though good editing and decent feedback ought to improve the product that reaches the record. Yeah, it’s a bit rougher on the ego, and yeah, it’s more “risky” in that if you FUBAR, you do it live on stage in front of the world instead of in a back room with a “peer”; welcome to the Wild West – I’ve got a few hundred years of ancestors who did not shrink from far greater adversity and risk: I’m not about to stop the tradition now.

    BTW, my “budget” has been about $40 for coffee and $10 for tea, a recycled 20 year old “white box” PC, and my time. It took me about 6 hours programming time (interrupted by kids, cats, spousal units, door to door solicitors,…) over 2 elapsed days to do this. “Budget” is not the issue. The desire to do it is. -emsmith’

  7. E.M.Smith says:

    @Harold Vance

    Perhaps I’ll be able to beg someone to do a validation test using completely different software and hardware 8-}

    @all

    Download USHCN.v2 data and description from:

    http://www1.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/

    The document that describes the files is:

    http://www1.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/readme.txt

    Extract 2006, 2007, and 2008 records.
    Sum temperatures, by months and for whole year.
    Compare to values above.

    Values ought to be very close the ones in the report above (differing only by the records that were not in the station inventory file v2.inv that I deleted).

    This ought to be doable with Excel or any database or any programming language.

    The file is just a flat file of text. Station, type flag, year, 12 months of temperatures…

    The comparison of the USHCNv2 file with GHCN is all that is needed for “proof”. Harold has already vetted that GHCN is selected to a reduced set; and, under the “California Beach” thread, we vetted that bit of code and result and showed that USHCN cuts off in 2007.

    http://chiefio.wordpress.com/2009/10/24/ghcn-california-on-the-beach-who-needs-snow/

    The only really “open issues” are did I screw up the selection of records to match the v2.inv station inventory; and is the conversion program buggy in some way? (posted source in the “How to fix it” thread):

    http://chiefio.wordpress.com/2009/11/06/ushcn-v2-gistemp-ghcn-what-will-it-take-to-fix-it/

    I’m working on a second, completely independent code path to what ought to be (almost) the same result. (The only difference ought to be in how the ‘skipped’ records are handled and that ought to be in 1/100 or less precision impact…)

    But it would, as always, be A Really Good Thing to have a completely independent attack / proof of the process and conclusion.

    Isn’t Real Science ™ fun? You get to beg people to attack your work 8-}

  8. Neil Fisher says:

    It might be interesting to go the other way too – if you use just the stations GISS used for 2008 for the entire available record. Just as a comparison, and assuming that they have the required data that far back. Yeah, I know – so much to do, so little time. I can wish, can’t I?

    REPLY: [ It's "on the list"... right after pizza with friends, decompress I need a break, pay bills it's more fun ;-) But seriously, probably about 2 days away. I want to do a bit more "vetting" of the above stuff before running off to new approaches. But if I'm in code that does something near to that, you can bet I'm going to dump out that benchmark too... -ems ]

  9. Level_Head says:

    Thank you, sir, for your excellent work. And thanks for your visit as well.

    A set of charts of global temperatures (or anomalies) by latitude band would be interesting, as it would tend to reduce the impact of the thermometers migrating toward the equator.

    You’ve got sausages to grind; I might take this up when I get past the current crunch, and a bit of eye surgery.

    ===|==============/ Level Head

  10. vjones says:

    While I’m still reeling from your reply http://chiefio.wordpress.com/2009/11/09/gistemp-a-human-view/#comment-1529

    There is more in your data above than the numbers give away at first sight
    Annual means:
    http://i36.tinypic.com/fx5rg4.png
    Records:
    http://i34.tinypic.com/11rdmk1.png

    Not only does the change of records change the annual temperature at the beginning and end of the series, but it changes the overall trend as well.

    What is the basis for fewer thermometers in the early part of the record?

    REPLY: [ I don't know for sure why the drop off happens early. I suspect it is an artifact of the change in station marking for "estimated". NOAA changed what flags mean for "estimated" data, and the early data have a lot of estimates.

    We have two moving parts here.

    1) I translate the "new" flag into an "old" flag and may not have made the best selection of translations. It is not a clear "one for one". And type M records get dropped by GIStemp (as that means they were just made up... estimated from near nothing.) My mapping is visible in the posted source code and suggestions for improvement welcomed. I toss a lot of the subtile nuances of "estimated" into the M bin together.

    2) The change of flags and meanings by NOAA was accompanied by a re-mark of the data by NOAA. It is possible that the very early data had the interpretation on "estimated" values changed from "estimated from something" to "estimated from not enough", and this NOAA action could toss more records into the "M bin".

    It would take knowing exactly what each flag meant, and looking at a representative sample of the early records, to figure out which of these two moving parts is "the issue". But you can see the rapid convergence of 'kept record counts' in the middle / late series show the effect is concentrated in the very earliest years of sparse questionable data.

    I most strongly suspect my 'flag mapping' but It has not been high on my list of 'issues to sort out'... -ems ]

  11. Peter Dunford says:

    Being British, I can’t ask these questions of GISS. Someone needs to do an FOI request, or ask their Congressman whether GISTemp has been starved of funding. I would expect lack of funds to be their first reason for not-updating the program for USHCN V2, and the drop off in thermometer count. They have had two years to do this. Then there needs to be an FOI request or congressional committee to find out the real reason why they haven’t updated it.

    On this side we will start asking whether HADCrut have done the same as GISTemp.

    REPLY: [ It is clearly not a funding issue. I did this fix in about 6 hours and I was not familiar with the data format changes, so most of the time was spent learning USHCN.v2 data format. This is something a programmer who regularly works on this bit of code and data could have done before lunch on a slow day... They have not done it for the simple reason that they didn't want to do it. The real question is why did they decide to deprecate USHCN. That was a design decision, not a programmer budget issue... -ems]

  12. vjones says:

    E.M.,

    I just found a site with the USHCN sites mapped and the data downloadable

    http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html

    REPLY: [ Nice! Thanks! This is a great addition to the GISS site. -ems]

  13. clivere says:

    I have been watching your blog with interest since this summer when someone referred to it at CA. I applaud your efforts to get the code running because others have previously stated it is a difficult exercise and I believe an independent review of GISS is very desirable. Given a successful emulation then it becomes a very powerful tool in understanding what GISS has done.

    This post is one of the first where you have done a test run which has an output that allocates an order of magnitude to the issues you have been raising. I am quite prepared to believe that there are issues within GISS and that the issues are significant but I do need to be convinced about the one you are describing here.

    There are 3 arguments that can be made in defence of GISS.

    1. The use of zones and anomalies should make the results immune from the impact of changes to the number of sites used unless those additional sites can be demonstrated to be radically different in their properties. You appear to believe anomalies and zones are not the saviour but I dont understand the processing mechanism that causes the difference you have identified here. You have made vague references to averaging of averaged averages and rounding issues but so far provided no clarity of the actual mechanisms at play.

    2. GISS comes up with similar results to Hadley. There is some commonality of input and possibly some shared processing methodology so may have common issues but I wouldn’t neccessarily expect similar rounding problems.

    3. GISS programs were “independently” reviewed by Nick Barnes “Clear Climate Code” project and I believe they got as far as implementing a revised version of one step in python which you have commented on. This project apparently reviewed and corrected rounding issues. Not sure why you are still finding them although it looks like the CCC project is unfinished so perhaps they did not get all the way through.

    With respect to your emulation have you been able to verify that it produces the same results as GISS by matching intermediate files and checking for differences. The CCC site does have outputs from the various steps and it would be nice to reassured you are getting the same results as they get.

  14. E.M.Smith says:

    clivere: I have been watching your blog with interest since this summer when someone referred to it at CA. I applaud your efforts to get the code running because others have previously stated it is a difficult exercise and I believe an independent review of GISS is very desirable.

    Thank you. It was not the fixing that was difficult, though, it was the coming to understand the details of some rather painfully written and under documented code. After that, the actual lines needed to get it to run were not that many. It’s documented here:

    http://chiefio.wordpress.com/2009/07/29/gistemp-a-cleaner-approach/

    Given a successful emulation then it becomes a very powerful tool in understanding what GISS has done.

    Um, minor “nit” to harvest. This is not an emulation. This is GIStemp. The real deal.

    Ported to run on Linux with the minimum changes possible / needed for stability; and none of them very significant to the operation. This was a deliberate act so that any “benchmarks” done can be vetted as clearly GIStemp and not “my code”.

    This post is one of the first where you have done a test run which has an output that allocates an order of magnitude to the issues you have been raising.

    Um, perhaps you missed one of my earliest ones where I found an issue with the compiler dependent math done in USHCN2v2.f of 1/10C per record in 10% of records with an overall 1/100 C warming of the data series?

    http://chiefio.wordpress.com/2009/07/30/gistemp-f-to-c-convert-issues/

    or:

    http://chiefio.wordpress.com/2009/08/12/gistemp-step1-data-change-profile/

    which finds a measured couple of tenths C uplift in the data through step 1

    or the very similar posting about the STEP0 data change profile…

    My pattern is to assess the code and do a qualitative review and then do a code review and then construct and do a benchmark that does measurement. Then I do a “post benchmark evaluation” and sometimes a “fix” such as this fix to put the current USHCN.v2 thermometers back in. (Which then gets followed by a re-benchmark).

    I’m not surprised you haven’t caught up on it all.

    And while not strictly GIStemp, the analysis of the input data change over time and space has had a fair degree of numerical, measurable, testable content:

    http://chiefio.wordpress.com/2009/11/03/ghcn-the-global-analysis/

    though of a more ‘distribution’ sort rather than ‘one value’ sort.

    I am quite prepared to believe that there are issues within GISS and that the issues are significant but I do need to be convinced about the one you are describing here.

    No problem. You have the code and the data. Easy enough to duplicate. For that matter, you can take the data for a sample period of time and count them by hand or even using Excel.

    This posting is about the “selection bias” that comes in the leaving out of the USHCN.v2 data set. Nothing more. So you can take that USHCN.v2 data for any individual month (after the USHCN old version cuts off) and add it up with whatever tool makes you comfortable.

    You do not need GIStemp running for that process. You ought to get values very close to those in the above charts. (In further benchmarks I will be taking this data through the rest of GIStemp. To validate what it does with this +0.6 C selection bias, you will need GIStemp running).

    There are 3 arguments that can be made in defence of GISS.

    I would suggest that there are more than that. Several of the processes that it uses / advocates have some merit. They are just not sufficient to overcome the issues.


    1. The use of zones and anomalies should make the results immune from the impact of changes to the number of sites used unless those additional sites can be demonstrated to be radically different in their properties.

    And there’s your first leap of faith. “Immune”.

    Nope.

    And I have demonstrated that the inputs are “radically different in their properties”. See all the “by latitude” postings and the new series of “by altitude” postings along with the demonstration that the older longer records show no bias but the newer records do show bias. Those are “radically different” inputs.

    No code is perfect and no method is perfect. I would use words more like “mitigate” or “more robust” or “resistant”. And I believe that it does mitigate the impact, but with less than 100% perfection, so the 0.6C bias can, and does, leak through. Mitigated, but not removed. See this (admittedly ‘first cut’ and rough) benchmark that shows “the anomaly changes”:

    http://chiefio.wordpress.com/2009/11/12/gistemp-witness-this-fully-armed-and-operational-anomaly-station/

    And since it does change the anomaly process is not perfect and the product is not “immune”. So now we have to look at exactly “how much” and measure: but it is not “zero”. (That “measuring in painful detail” is what is on the plate for the coming week).

    You appear to believe anomalies and zones are not the saviour

    It was a belief up until I ran the benchmark. Now it is a demonstrated fact. Next it will become a measured quantity. That is the process of vetting code with benchmarks.

    No filter has perfect “Q”. GIStemp is perported to be a filter with perfect “Q”, yet through the first steps it is measured to be an “amplifier” not a “filter”. That means the following section (STEP 3) must be one heck of filter. Beyond perfect filtering of the data bias, it must also filter out the amplification of the early steps…

    but I dont understand the processing mechanism that causes the difference you have identified here.

    Pretty simple: USHCN “cuts off” in 2007. GHCN drops most of the same stations at about the same time. By using USHCN.v2 (that GISS ought to have done already) I “put them back” into the input data. Then measure a 0.6C impact from the change. There is no opinion in this, it is a measured behaviour.

    The only argument that can be made about it is that the US thermometers ought to be left out and that my putting them back in is somehow wrong. I’m willing to explore that, but it is a weak argument. (Especially given that taking them out biases the STEP0 output by a 0.6C demonstrated warming. You would need to prove a -0.6C corrective behaviour in the following steps of GIStemp to make that an acceptable behaviour.)

    You have made vague references to averaging of averaged averages

    Nothing at all vague about it. It is an accurate statement of what GIStemp and NOAA do. The daily MIN / MAX are averaged. That average is adjusted by NOAA based on other data that are themselves often averages. That corrected average is then averaged for the days of a month to give the monthly Average of MIN/MAX mean that is fed into GIStemp.

    All this is before GIStemp does it’s own first averaging. There are many more steps in GIStemp that do averaging and I will not be repeating them here. It is posted under the GIStemp tab up top and covered in detailed articles that look at each coding step as I finish it. PApars.f in particular has been covered as it does the UHI “adjustment” based on averaging together up to 20 records (IIRC) that are used to move the past temperatures of a given station. And in STEP1 various fractional records are merged by various in-fill and splice processes that use, yes, averages. It’s all there in the code, and has no vagueness about it. And, of course, there is much more averaging done in the creation of the ‘zonal averages’ and the grid / box averages used to produce the anomalies that are measured against the “baseline average”.

    and rounding issues but so far provided no clarity of the actual mechanisms at play.

    See the above

    http://chiefio.wordpress.com/2009/07/30/gistemp-f-to-c-convert-issues/

    as one example. Every single averaging step needs the same kind of analysis. I’ll get there eventually, but there is only one of me and discovering things like the violence done to the thermometer record by deleting 93% of the thermometers in USA have taken priority. It is, however, very clearly the case that when you do arithmetic in a computer you will accumulate rounding errors in the low order bits.

    (There are some ‘infinite precision’ math packages, but those are highly specialized and not used in GIStemp. GIStemp uses standard FORTRAN and when using regular INTEGER and REAL data types, you must “vet” every single math operation performed for: rounding, underflow, overflow, truncation, precision, accuracy, and generally be aware that error accumulates unless you have take extraordinary pains to avoid it. Pains that are not in evidence in GIStemp.)

    FWIW, rounding and similar issues are, IMHO, the smallest issue with GIStemp. I would guess it is more than the 1/100C place (we’ve already found that much) but less than the 1/2 C place. Somewhere in the 1/10 to 3/10 C as a reasonable guess.

    2. GISS comes up with similar results to Hadley.

    All the more reason to suspect the Hadley code “shares issues”. I will gladly give it the same type of code review if they every decide to release their code. Until then, Hadley is just a ‘black box’ that lost their data. For all we know, it could be GIStemp. When you have a demonstrated broken clock, and another agrees with it, you suspect both of being broken…

    These folks share papers, share journals, share beliefs. It is very common for folks indulging in that much “group think” to come up with similar “solutions” with similar failings. So Hansen published a peer reviewed paper showing ‘up to 1000 km’ can be used for ‘the reference station method’. I could easily see a researcher at Hadley saying: “Well, it’s peer reviewed, so we better use 1000 km and the reference station method”. One of the problems with the ‘must be peer reviewed’ mantra is that it enforces group think and shared error.

    http://chiefio.wordpress.com/2009/09/14/gistemp-pas-dun-coup/

    There is some commonality of input and possibly some shared processing methodology so may have common issues but I wouldn’t neccessarily expect similar rounding problems.

    I, too, think that the “commonality of input” is the biggest issue and would suspect that “shared processing methodology” is most of the rest. As this article demonstrated, the “input” selection bias is a measured 0.6C from the single decision to leave out USHCN.v2 in 2007. This is greater than the 1/2 C probable upper bound I would put on math precision / rounding issues.

    To put a very fine point on it: I suspect that the final answer will simply be that folks believed in a “perfect filter” and deleted or changed thermometers based on that belief. But that belief is fundamentally broken. No filter is perfect. So both Hadley and GISS take the GHCN data and assume a perfect filter will fix it and both write imperfect filters. It takes nothing more than that for them to make similar results.

    But I assume no filter is perfect; so I’m doing what they ought to have done in the first place: Measure the actual changes in the data and measure the actual degree of filtering done (the filter “Q” or “Quality Factor”.) And those numbers will tell everyone exactly how good a filter it is, and how much of the 93% thermometer deletions show up in the product. But we already know the quantity is more than zero.


    3. GISS programs were “independently” reviewed by Nick Barnes “Clear Climate Code” project and I believe they got as far as implementing a revised version of one step in python which you have commented on.

    I believe their stated goal was to do a “translation” not a benchmark and code QA. I would expect them to find some issues as part of a port / translation, but the major focus of a port is not testing the design of the original, but rather to make the port function the same way, bugs and all.

    This project apparently reviewed and corrected rounding issues.

    I think you are making a bit of a leap. They found and fixed one or two issues, IIRC. One being in USHCN2v2.f where a FORMAT statement was ‘off by one’ in the data read step. That “fix” was already in the code I down loaded, so they did not find the compiler dependent 1/10 C error from bit shifting that I found above. They did not do an exhaustive math review. The report of it I heard was more of a “found this by accident while doing port” rather than “was doing math benchmarking and found…”. (But in reality, only they can speak to their process).

    Not sure why you are still finding them although it looks like the CCC project is unfinished so perhaps they did not get all the way through.

    As I understand it, they are not done. Also, I’m a bit of a stickler on math issues. But as noted above, I don’t think I’ll find much above the (already demonstrated) 1/10 C place.

    With respect to your emulation

    It is not an emulation. It is the real GIStemp code being run on the NOAA data as directed in the GIStemp documentation. It is GIStemp.

    have you been able to verify that it produces the same results as GISS by matching intermediate files and checking for differences.

    I don’t really need to do that since it IS GIStemp. But spot checks can be done. I am also in communication with a gentleman doing a C port and we have matched results. As part of my “end to end benchmark” next week, I’ll be downloading the current NOAA data and at that time I’ll compare the STEP3 anomaly reports from “my” GIStemp to the published reports.

    Once again: This is the GIStemp code. The things it took to make it run were not material. (Almost entirely matching the f77 or g95 compiler to the steps that used f77 or f90 and taking data initializations out of variable declarations into dedicated DATA statements. Not the kind of thing that would effect processing or output. Though I did surface that their code was sensitive to compiler choice in that STEP0 F to C conversion step… and provided a “fix”.)

    The CCC site does have outputs from the various steps and it would be nice to reassured you are getting the same results as they get.

    Anyone wants any output from any step, just tell me what FTP server to put it on.

    This is an “all open, all visible” operation. Anyone who wants to join in is welcome to come to the party. I’ve put a fair amount of output up already. Any one can compare it. Other output available upon request.

    As long as it’s just me, though, it’s going to be what I put at the head of the queue that gets done first. For now, that’s to assume that GIStemp is GIStemp when the source code is from the NASA site; and benchmark what it does to the data.

    AFTER I’ve got the data and process characterized and benchmarked “end to end”, well, then I’ll worry about if there is are a few bits of precision in the low order bits that have any jitter between their hardware and mine or their f90 compiler vs the g95 compiler.

    FWIW, the “rough benchmark” of the anomaly step (in the link above) looks like about a 2/10 impact from the 6/10 of uplift measured in this posting. When I have the final vetted benchmark done and run I’ll have a ‘real number’ to post. But as a ‘first look rough cut’ that’s about a 66% reduction (of this “bolus” of warming bias) from the Zone, Grid, Box anomaly step. Not a bad performance. More than I’d expected.

    Unfortunately, that means 1/3 gets through. That would imply about 0.2C of warming bias in the final anomaly step comes from this “selection bias” of leaving out USHCN.v2 after the file format change.

    Since some folks want us all hot and bothered about 1/10 C (and some folks even get lathered about 1/100 C place changes) that is a ‘material issue’…

    I was not going to put that out for public view until I’d done a fuller and more clean benchmark (updated v2.inv file, fresh download, etc.). But since we’re “on the topic”, and with a pile of “early in the benchmark process” caveats, that’s about the figure of Q we’re looking at.

    Just to make it clear: I’m fairly certain that the “Anomaly, box, grid, zone” process does, in fact, reduce the impact of thermometer change. I am just also fairly certain that it is not a perfect filter and some of the input change bias gets through. The first cut rough measurement is 1/3, but the final cut of the benchmark will be definitive.

    Hope this (rather long…) reply is helpful to you.

  15. vjones says:

    E.M.,
    I will await next week’s posts with eager anticipation (more so than usual).

    …rounding issues…. somewhere in the 1/10 to 3/10 C as a reasonable guess.

    1/3 gets through. That would imply about 0.2C of warming bias in the final anomaly step comes from this “selection bias” of leaving out USHCN.v2 after the file format change.

    Now we’re getting to the foundations and have discovered the big layer of clay that has the potential to undermine the stability of the whole product.

    I realise what you have sound so far is USHCN/GHCN related. Add that to station changes and poor, even reverse UHI adjustment and my belief is that the 0.6-0.7C of unprecidented global warming will reduce to a more normal natural variation magnitude of 0.3C.

    REPLY: [ Sounds about right to me. This is slow slogging, but that is the only correct way to do it. I could get it done in about 1/3 the time if my funding were something more than zero and my time available was more than "spare time between chores". Oh well. It will eventually all get done. -ems ]

  16. clivere says:

    ok – thanks for the reply which clarifies some of this for me. Please dont assume I am hostile to what you are doing. I have reason to believe there are issues with GISS but you appear to uncovering something I was not expecting so I am probing to get a better understanding.

    I dont want to get hung up about terminology so I will refer to your version if that is ok with you.

    I am pleased you are taking steps to compare outputs as that will improve confidence that your version is ok.

    I had previously read the 0.6C as being the difference between full runs of all the steps. My mistake. You are suggesting that the actual difference for a full run is likely to be a lot less than 0.6 which for me is more subtle and therefore much more plausible. Particularly as you imply rounding is a significant contributer.

    You have also clarified that the use of anomalies and zones is post the step in question and I now understand why they dont matter for this issue.

    From a processing perspective I still dont understand the exact nature of the issue. The change in number of records is a trigger. I recognise that rounding issues can go in different directions depending for example on whether individual figures are rounded, totals are rounded, whether the rounding is up or down, values truncated etc etc.

    Looking at the records from the early years it appears that lower numbers of records mean higher yearly averages and I am intrigued at the actual mechanisms which would have that impact particularly as the magnitude is still relatively gross.

    It would also be interesting what the impact would be on the period from 1900 to 2008 if each year was somehow constrained to the same number of approx say 900 records from mainly the same stations.

  17. E.M.Smith says:

    clivere: ok – thanks for the reply which clarifies some of this for me. Please dont assume I am hostile to what you are doing.

    You’re welcome; and I don’t… It’s just that the field is rather technical so precision in the details matters.

    I dont want to get hung up about terminology so I will refer to your version if that is ok with you.

    Unfortunately, the terminology is highly important. An “emulation” is a fake that looks sort of like the original. A new instantiation of the original is most like within one part in a few thousand of identical (and will most often be identical to the limit of all bits.) As an analogy, a “soyburger” emulates hamburger; while the Macdonalds 1 mile away is expected to be indistinguishable from the one 10 miles away.

    Calling it “my version of GIStemp” is completely accurate. Calling it “my port of GIStemp” is more precise but only the geeks among us will notice.

    I am pleased you are taking steps to compare outputs as that will improve confidence that your version is ok.

    It’s all just a matter of time. I’ve done software QA for a living before (major compiler tool chain, among other products). Had a team of 1/2 dozen folks then, though… I’m just frustrated at how slow it’s going and how long it takes me some times.

    I had previously read the 0.6C as being the difference between full runs of all the steps. My mistake.

    Understandable one. Again, one of those “terminology” things that matters. I specifically was measuring only the “Selection Bias” (or more properly, the “Survivor Bias”) from the decision NOT to do the maintenance programming to keep the USHCN thermometers in during the transition to USHCN.v2 format.

    When measuring a filter, you want a step function in the input, then measure the degree to which it gets to the output. This posting measures that step function in the input from that decision. The bigger it is, the better the filter must be to remove it. 0.6 C is a big rise, so STEP3 will have a great deal of work to do from this step function. It also has about a 0.2 C warming bias from STEP0 and STEP1 (documented in that other link) to remove as well. So we’re getting to the point where we have about a whole degree C of warming bias in the input that STEP3 is expected to remove. That will be very hard to do.

    You are suggesting that the actual difference for a full run is likely to be a lot less than 0.6 which for me is more subtle and therefore much more plausible.

    I’ll go stronger than that: GIStemp will reduce that 0.6C to some significant degree. The first cut benchmark is that it will end up 0.2C with about a +/- 0.1C error bar. But since 0.1C is held out as the metric for “Doom in Our Time”, that is a significant deal…

    Particularly as you imply rounding is a significant contributer.

    You seem particularly focused on “rounding error”. It is only one class of error and often one of the smaller ones. I mention it (probably more than justified) because it is something that always exists in normal computer math (such as in FORTRAN) and most folks know what a rounding error is. If I say “bit shift” error; folks will glaze over. (But that was, IMHO, the cause of the 1/10C in the F to C conversion error. the order in which math was done sometimes caused a larger “bit shift” in the ( 5 * T )/9 calculation, so you got a different enough result to warm 10% of records by 1/10 C depending on the compiler you used. I ‘fixed it’ by doing the 5/9 division once (and thus no variable bit shift… 100*5 shifts everything 2 decimal places, then divide by 9 moves it back one… and some precision fell off the end due to the bit shift…).

    One of the more obscure in FORTRAN is that division by an integer truncates:

    10 / 3 = 3

    If you want to get 3.3333… you need to do 10.0 / 3.0 or sometimes written as 10. / 3. so just a missing “.” can turn a 3.3333333 into a 3 with no fractional part.

    An obscure behaviour of FORTRAN, but that’s the kind of thing that needs to be looked at for every single bit of math done in the whole thing. (BTW, there are several more kinds of ‘math error’ like these that can cause significant errors in a program and are often not thought about by non-programmers – i.e. researchers, accountants, etc.)

    You have also clarified that the use of anomalies and zones is post the step in question and I now understand why they dont matter for this issue.

    Now that I’ve made it clear, let me muddy it up just a little :-)

    In STEP1 there is a comparison of fragments of some records to each other and to the averages of their neighbors. This step splices fragments together and fills in missing bits. It does use a “sort of an anomaly” in that it will compute the offset between a station and other stations and treats that offset as the “anomaly” for computing the fill-ins. Technically, it is a kind of an anomaly, but not in the sense that most folks think (zonal / grid /box anomalies). The actual math is more a comparison of averages, but they are normalized to a mean, so technically is is an ‘anomaly’ like process. (Though I think it is best described as a comparison of the station data to the average of neighbors). I know, painfully detailed technical minutia… but that is the stuff of computer program audits…

    But at the end of the day, it still doesn’t matter. It is the final end to end benchmark that will measure “Q”.

    From a processing perspective I still dont understand the exact nature of the issue. The change in number of records is a trigger.

    BINGO!

    The claim is that you can have 100 lbs pressure in the hose, or 10 lbs pressure and the same spray will come out the far end.

    I’m measuring how much the pressure on the input changes (0.6 C from this one ‘decision’ to leave out USHCN.v2) and did a ‘peek ahead’ and saw the spray changes (by what looks like 0.1 C to 0.3 C ‘by eyeball’). Now comes the part where we put on the rain coat and take the tape measure and measure exactly how much the range changed…

    I recognise that rounding issues can go in different directions depending for example on whether individual figures are rounded, totals are rounded, whether the rounding is up or down, values truncated etc etc.

    Yes, but bit shifts tend to have a bias. You shift one way and drop low order bits, then you shift back. The effect is simply to turn those low order bits into zeros. If used in a division, you get a bias up or down depending on which side of the divide it is on… The bias in that particular calculation will always be in the same direction (as long as we’re talking integers… i.e. not “divide by 0.5″ which is really a multiply by 2 as far as bit shifting…)

    Looking at the records from the early years it appears that lower numbers of records mean higher yearly averages and I am intrigued at the actual mechanisms which would have that impact particularly as the magnitude is still relatively gross.

    The early years are particularly sparse in data. I suspect that it is an artifact of NOAA ‘rescoring’ some of the early data as “estimates” (GIStemp tosses out “fully estimated” records…). This is a ‘selection bias’ issue, rather than a ’rounding issue’ IMHO.

    It would also be interesting what the impact would be on the period from 1900 to 2008 if each year was somehow constrained to the same number of approx say 900 records from mainly the same stations.

    That was one of my earlier attempts. Unfortunately, GIStemp is written in a very “brittle” way (that is, not resilient – for example, I put out a “log file” of records “not in the v2.inv file” and keep on going in my “fix” for USHCN2v2.f where the original just dies on you).

    I’ve spent the better part of the day trying to get a consistent USHCN.v2 file, but cut back to end in 2007 as does USHCN the original, to run through STEP1. It runs through STEP0 just fine (as does the full through 2009 version of USHCN.v2); but STEP1 tosses it’s cookies. Don’t yet know how to fix it.

    So what ought to have been a 1/2 hour “cut USHCN.v2 to 2007″, run steps 1-3, compare to prior full USHCN.v2 and to prior USHCN-2007 runs: has instead been 1.5 days with nothing to show for it. So far… It would have been a great benchmark. (USHCN-2007 vs USHCN.v2-2007 showing the exact impact of the NOAA change from 1/100 F to 1/10 F; while USHCN.v2-2007 vs USHCN.v2-2009 giving an exact “selection bias” reading from the added time span only.)

    Instead I’m trying to debug badly designed and poorly written brittle code with lousy error handling and no documentation to speak of. And getting nowhere fast. All you get is a hard crash and not much usable error or activity logs. I suspect somewhere is a file with some fragment of a site in it that expects to match some input record, and it crashes instead of doing something reasonable (like “log it, skip it, and move on”). This “splicing” step has several hand made custom tweak files that feed it little bits of mystery..

    So that is why all those lovely clean comparisons, that we’d all like to see, come out so slowly. And why the ‘first benchmarks’ out are often these “mixed things”. It is because that was the only thing that would make it through the code without crashing…

    It seems to be particularly sensitive to data deletions. Exactly the thing you need to do to do the most interesting tests.

    Oh well, I’ll figure it out. It just takes time…

  18. drj11 says:

    Currently our ccc-gistemp uses USHCN version 1, and official GISTEMP uses USHCN version 2. So the somewhat ad hoc comparison I did last month can be used to get a rough idea of the difference between using one version or the other. Our blog post, “How close are we to GISTEMP?”, shows the differences (in the global anomaly) to be at most 0.01 or 0.02 K. I suspect that those differences are due to things other than USHCN v1/v2.

  19. Nick Barnes says:

    3. GISS programs were “independently” reviewed by Nick Barnes “Clear Climate Code” project and I believe they got as far as implementing a revised version of one step in python which you have commented on.

    I believe their stated goal was to do a “translation” not a benchmark and code QA. I would expect them to find some issues as part of a port / translation, but the major focus of a port is not testing the design of the original, but rather to make the port function the same way, bugs and all.

    This project apparently reviewed and corrected rounding issues.

    I think you are making a bit of a leap. They found and fixed one or two issues, IIRC. One being in USHCN2v2.f where a FORMAT statement was ‘off by one’ in the data read step. That “fix” was already in the code I down loaded, so they did not find the compiler dependent 1/10 C error from bit shifting that I found above. They did not do an exhaustive math review. The report of it I heard was more of a “found this by accident while doing port” rather than “was doing math benchmarking and found…”. (But in reality, only they can speak to their process).

    I’ll take that as an invitation, although I should point out that the project is not just me. As we have made clear from the beginning, we are not scientists, and are not interested in making a new analysis. We want to take the mishmash of grotty old Fortran, typical science code, especially for older code bases – make no mistake, GISTEMP is not particularly bad in this respect – and write some lovely clear code implementing the same algorithm which anyone can understand. That code can then form the basis for informed discussion of the algorithms (and, quite possibly, for improving those algorithms). For instance, anyone could easily tinker with the STEP1 scribal-record combination.

    Before our first announcement last year we had STEP0, STEP1, and STEP3 in Python. Now we have the whole thing in Python, including STEP4. We’re working on STEP6. I’m rewriting our STEP0 USHCN import to match the new USHCNv2 import that GISTEMP now has. Once that is done, we’re going to make a release from our GoogleCode project and an announcement post over on our blog. In the meantime, anyone can download all the current code from GoogleCode.

    Our results match GISTEMP, of course, quite closely (not exactly, for a small number of reasons including the difficulty of doing mixed-precision floating-point arithmetic in Python, which defaults to double). If they didn’t, that would be a bug. Our code includes programs to compare result files and generate an HTML report on the differences found.

    That’s the first pass.

    The second pass will involve refactoring the code. Our current STEP2 code includes some of GISS’s own STEP2 (which was already in Python), and is in serious need of improvement.

    All the code carefully matches the low-level Fortran behaviour. As a trivial example, STEP3 separates all the temperature records into 6 bands before calculating box means, because when the code was written it was not possible to load all the data into memory at once. This will have some (probably tiny) effect on the results, because it affects the order in which records are combined, which will affect rounding order, for instance.

    The third pass will involve simplifying a lot of that behaviour: retaining the same high-level algorithm but losing the 80s-style data pipelining. As well as being much clearer, we expect the resulting code to go a lot quicker.

    For more information, come over to the blog and ask. Note that it is not, absolutely not, yet another blog in which people to-and-fro abusively on the politics of climate change, or on random climate news. There are plenty of blogs like that out there, and no such comments will be tolerated at CCC. (NikfromNYC’s comment today is right on the borderline; I passed it because it asked some important questions about the actual project, which we are happy to answer).

    We welcome contributors – people who are willing to either write or review code. This absolutely includes “sceptics”, as a recent blog post made clear. Volunteers who are able to do the work, and have the time and inclination, are thin on the ground, so it can take us a long time to get much done.

  20. E.M.Smith says:

    @drj11:

    Unfortunately, you can’t compare USHCN v1 (truncates 2007) with USHCN.v2 and get a “survivor bias” metric since v2 has a different set of “adjustments” in the data. I don’t have an article ready yet, but v2 is “warmer” than v1.

    It looks to me (rampant speculation) like the adjustments used are cooking v2 so that putting it in adds to the warming profile (just as leaving out v1 did). To demonstrate this takes a v2 vs v1 comparison over only the period in common, and then a benchmark of 2007 to date with v2 in vs out. I know I ought to do it, but I’ve gotten distracted by other things… like the blatant corruption of GHCN via deleting cold thermometers in particular and 90% or so in total in years after the 1980s.

    @ Nick Barnes:

    What you folks are doing “is a beautiful thing” and I just wish I had time to contribute to it. But I see our directions as complimentary, so I’d be loath to abandon mine. (And frankly, I’m better at remembering my old FORTRAN class than doing PYTHON. I can read PYTHON OK, but not up to writing it yet.)

    Mine is to characterize what is actually being run and see where it “has issues” (of any size). That, IMHO, will surface areas that both GIStemp proper, and you, can look at to enhance your product. Basically, a “shopping list” for when your port is done and benchmarked to match GIStemp “end to end” and it’s time to go bug hunting. (Though with luck, you will stamp out some ‘issues’ as you find them.)

    And yes, despite my whining about how bad GIStemp is to work on, I’ve actually seen worse… (shudder!)

    BTW, at this point I think that the ‘code issues’ in GIStemp are ‘the small fish’ and that the massive changes of thermometers used in GHCN are ‘the big fish’. If you can, it would be interesting to know if your code will accept “subsets” end to end: then feed it the “surviving 1176 locations” in GHCN (i.e. remove survivor bias) with the (deleted after about 1989) 7k or so bolus in the baseline period removed. If you are uncomfortable publishing such a benchmark, just knowing that subsets of data will run end to end would be helpful to know.

    As noted above, GIStemp in FORTRAN is not happy with subset data in some steps so it will take me a while to work around this. ( I found a way, but it’s unpleasant. Changing the data to “missing data flags” works, but it’s a PITA for benchmarking.)

    And finally, now that you have it working through STEP4, I may just down load a copy and give it a whirl. When I first looked, it was missing a step along the way. Certainly for figuring out what a given FORTRAN step is supposed to be doing it will be “a better ride” for most folks. I can also see where running benchmark ideas through it first would be faster and easier, THEN going back to try and nurse them through GIStemp proper.

    (But right now my disk is full… too many copies of historical benchmark variation datasets. I need to either archive some stuff or add a new disk… And I need to do an update to the latest revision of GIStemp… And … Sigh…)

  21. vjones says:

    E.M.,

    As you know we have all the station data in database format now here, again to help investigate ‘issues’. In theory it should be possible to produce partial input files (and the matching partial station.inv files) for you – if this would help.

    Do you think this would be feasible or am I just showing my lack of knowledge of GIStemp? ;-)

  22. drj11 says:

    @E.M.Smith: you say «for when your port is done and benchmarked to match GIStemp “end to end”». ccc-gistemp (our “port”) is now fully rewritten in Python and matches GISTEMP “end to end” to within 0.01 or 0.02 K when comparing the global anomaly. The differences are tiny, see the article I mentioned earlier.

  23. Nick Barnes says:

    What you folks are doing “is a beautiful thing” and I just wish I had time to contribute to it. But I see our directions as complimentary, so I’d be loath to abandon mine. (And frankly, I’m better at remembering my old FORTRAN class than doing PYTHON. I can read PYTHON OK, but not up to writing it yet.)

    Thank you, and yes, I think our directions may be complementary, but please do feel free to point anyone else in the GISTEMP-analysis space in our direction.
    And yes, among other things, it is already easier to switch stuff around in the Python (e.g. eliminating sub-steps – such as the Hohenpeissenberg data or the St Kilda adjustment – or removing stations or sets of stations, would be a matter of a line or two of clear code). In my dreams, a future CCC-GISTEMP is the reference to which amateurs turn to answer questions like “what happens if we do the peri-urban adjustment like this”, or “what if we have a different box/sub-box grid”, or “how about if we combine ocean and land data more like the JMA”. Certainly, that sort of question is now pretty easy for me, personally, to answer (mostly the answers are “it makes negligible difference to the anomaly signal”, but maybe I am still asking the wrong questions).

  24. E.M.Smith says:

    Well, one benchmark I’d love to see is what happens to the anomaly map if you put thermometers from Bolivia back in ;-) or the Canadian Yukon and Northwest Territories… but maybe that’s just me ;-)

    Another would be to take, oh, South America and remove all thermometer data from prior to 1990 that does not have a matching station in the 1990 to date period (stabilize the instrument) and do an A / B anomaly map. Would tend to settle the issue of what impact the GHCN deletions has…

  25. Holy Smokes;

    These guys have the maps thing down cold;
    http://82.42.138.62/

    They’re obviously way ahead on how to put mapped-things on the web.
    I’ll keep looking for some way to make a difference, albeit modest.
    Cheers!
    ACakaRR

    Still no ability to register for their TEKthing, whatever it may be.
    Love those dots and the zoom thing. Tool tip, the works…

  26. vjones says:

    @almostcertainly,
    Apologies for the glitch with registering. I’m sure Kevin (who is UK-baed) will get on to it in the morning.

    Glad you like the maps.

  27. juanslayton says:

    Just came across this post; wish I had seen it last November.

    “So this is a list of the stations in the USHCN.v2 inventory format for which I need to create GHCN v2.inv file entries that look like:

    42572786007 COLVILLE 5NE 48.58 -117.80 914 885R -9MVxxno-9x-9COOL CONIFER A1 0″

    Interesting you should use Colville as an example. I have now made two trips up there to try to document that station history. (Pictures are in surfacestation.org gallery.) So far as I can see, COLVILLE 5NE was never in the USHCN.

    MMS identifies COOP 451630 as “COLVILLE’; COLVILLE 5NE is now known as “COLVILLE BASIC”, and has, so far as I can see, always had the COOP number 451654.

    But
    http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ushcn-stations.txt
    shows this history:
    451630 48.5472 -117.9019 505.0 WA COLVILLE 451630 451650 —— +8

    So the change sequence was from COLVILLE (in town) to the airport (451650) and back to town.

    Unfortunately, by the time I got there, the station had been closed or moved yet again; with the help of the local mail carrier I was able to locate the site and take a picture of the post on which the MMTS had been mounted (with the wires still hanging out). MMS still shows it as current; but they seem to be very slow in updating. (Last year they hadn’t updated the location of Red Lodge, Montana, two years after the station was moved. I had to file an FOI request to get the current loocation.)

    REPLY: [ Wow, that's quite a story! Yeah, station moves make musical chairs look stable... -E.M.Smith ]

  28. Pingback: American Thinker on CRU, GISS, and Climategate « Watts Up With That?

  29. Pingback: Climategate: CRU Was But the Tip of the Iceberg « Thoughts Of A Conservative Christian

  30. Hello, E M Smith,

    I’ve never been here before, so am writing to say how impressed I am with the work you are doing. I particularly enjoyed your comprehensive reply to comments/criticisms on 13th Nov by Clivere, and I hope that he/she now fully appreciates the huge effort you had put in and indeed continue to do. If not I suggest that the route to follow is to repeat at least part of your work, or perhaps to think of another technology for critical examination of extensive data bases. This would dispel any doubts about just what is involved.

    I’m also a climate time series analyst of 16 years standing, as a retirement hobby. I use data from many and assorted sources, including the GISS source. However, I’ve never attempted to operate on the whole set, simply one station at a time. I have probably looked in detail at a few thousand such series. You may deem this to be a trivial (or perhaps useless!) exercise, but in fact it can disclose a very interesting feature of this type of series. It is that step changes of considerable magnitude are a very common, almost ubiquitous, occurrence. I’ve not read enough of this blog yet to be sure that you have not mentioned this type of analytical observation. If you have, my apologies. If not, it is perhaps something that you might find interesting.

    You’ve described the multiple averaging processes that are used to derive an annual average from the raw, recorded observations of climate technicians. Averaging, though an essential process, has at least one very unfortunate consequence. It virtually eliminates, and certainly disguises or hides, what might be very valuable information contained in the the data. As an ex industrial scientist I am a strong advocate of avoiding averaging as far as possible, and although I have to accept that monthly averages have necessarily suffered several averaging operations before I ever get to see them! Averaging over many sites or regions certainly tends to reduce the sharpness of step changes, since the steps that are evident for individual sites may have time offsets that reduce their impact when agglomerated.

    Nevertheless, I am convinced that step changes are a somewhat neglected aspect of climate (and related series) and I’d welcome any input from others who work on the numerical side of climate information.

    What a great blog you have!

    REPLY: [ Thanks! I think the individual station data series are a critical part of 'the issue'; especially given that, as you pointed out, they are pandemic, and they are not compatible with CO2 as smoothly causal. But this is a 'communal barn raising' we're all doing. We each pick a part and 'do what we can'. Other folks were already looking at individual temperature series when I started, but nobody was looking at the insides of GIStemp. I spent about 6 months saying "Somebody ought to analyse GIStemp" before I decided that "I was somebody"... I had the needed skill with FORTRAN (even if a bit musty for lack of recent use) and UNIX / LINUX. So I chose to work on the it. There are many other things I'd rather be working on, but this was where I had my 'highest and best use' and where 'something just needed doing and was not being done'. I ended up looking at GHCN data in bulk as a consequence of doing a benchmark series on GIStemp, not 'by design'. That's when I saw just how horrid the bulk input data was.

    BTW, my "day job" (sorely neglected for about 6 months now...) is that I make some pocket change trading stocks. In stock market indicators it is widely understood that you use averages TO HIDE what you do not want to see. Weekly averages are used to hide daily excursions, for example. When I saw how much averaging was going on in GIStemp (and prior to it at NCDC) my first thought was "My God, they are hiding a lot of SOMETHING in the data with all this averaging. What they are hiding is that it has far too little spacial and temporal coverage; and the changes of the bulk temperature come about from a collection of individual station changes (such as you describe) that are not possibly CO2 related. I can't say if the 'hiding' was deliberate or simply 'believing their own BS' about anomalies; but I can say that is what an average does... Hope you enjoy the rest of the site. -E.M.Smith ]

  31. Jeff Alberts says:

    I know this is an old thread, but.

    Don’t know if you read Climate Audit, E. M., but if you do you’re familiar with the sock puppet “Thefordprefect”. Well, he’s been castigating SM for not tearing apart the GISTEMP stuff. I told him to come over here and post his questions, but I doubt he will.

    If you’re interested and have the time, here are his latest frothings: http://climateaudit.org/2010/12/26/nasa-giss-adjusting-the-adjustments/#comment-250826

    I think he needs to be set straight.

  32. Ruhroh says:

    Hey Cheif;

    Someone posted this over at the SteveMc ‘jesting with adjusters’ thread;

    http://climateaudit.org/2010/12/26/nasa-giss-adjusting-the-adjustments/#comment-250862

    “The code for the ‘automated pairwise bias adjustment software’ (Menne and Williams 2009), used in the U.S. HCN version 2 monthly temperature dataset, is available at ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/software

    At least it isn’t written in COBOL…
    Party Time Now!
    RR

  33. Ruhroh says:

    Apparently ‘Troyca’ has taken a shot at running the Pointwise Homogenizer;

    http://troyca.wordpress.com/2010/12/10/running-the-ushcnv2-software-pairwise-homogeniety-algorithm/

    He posts some comments on what it took to run the thing. I don’t yet see a hint of the forensic mindset, but maybe I’m the one lacking it…
    RR

Comments are closed.