GIStemp – who needs Antarctic data or temps near ice.

I finally got GIStemp source code loaded into the Qemu SPARC emulator, unpacked it, and did a preliminary assessment.

First off, it is way out of date. It looks like it is a 2011 release from just prior to a major revision. So no source code for 1/3 of a decade or so AND a major revision. Not exactly caring about that whole ‘currency’ thing…

Next up, in looking at the things that did change, I noticed some things. First off, it points to a link for the Antarctic data that no longer works. Did it work in 2011? Who knows. But now it has a nice glitzy interactive web site that lets you download bits and pieces and look at bits and pieces… but not grab the data sets that are used as input to GIStemp. This means that either they have a very different way of getting the Antarctic data, or they just don’t bother to use it any more.

In this download, the same source is used as in the older version described here:

http://chiefio.wordpress.com/2009/02/24/gistemp-start/

Except that site now has a user friendly interface, not somewhere that you can download the data as a file.

http://www.antarctica.ac.uk/met/READER/surface/stationpt.html

http://www.antarctica.ac.uk/met/READER/temperature.html

http://www.antarctica.ac.uk/met/READER/aws/awspt.html

So look around at the interface on those pages. You can ‘hunt and peck’ for a station detail, on a point in time basis. Download the data set? Not so much… Then again, notice on the ‘temperatures’ page linked above; the last update date:

“Last modified : Friday 6 February 2009″

So… just what Antarctic data ARE used by GIStemp? Data, we don’t need no stieenking data! (A reference to a movie with the line “Badges? Badges! We don’t need no steenking badges!” give by some Mexican banditos pretending to be Federal agents.) Now this isn’t as bad as it looks ( or maybe it is worse…) since most of those stations don’t have any modern data anyway. Click on a few. Couple of decades for many. Often from disjoint moments in time. So again: Just what Antarctic data are used in GIStemp? Say, from this millennium?

A screen capture of the listing of the input_files directory where the Antarctic data is bundled in with GIStemp sources is somewhat revealing.

GIStemp input_files datestamps

GIStemp input_files datestamps

Notice that the Antarctic data included in the download are from ‘about’ the same vintage as the source code. 2011 is when I think they last updated. Some of the antarc data files are 2011, some 2010, some 2009… though the 2009 is marked ‘old’. So when did the Antarctic site break the download? Has there been any update since? Does GIStemp even use any Antarctic data anymore? Who knows.

But wait, there’s more…

Since GIStemp smears data via ‘homogenizing’ to places where it has no data, from places where it does have data, and since those places can be up to 1200 km away (though it does this smear in three different steps, so really might be ‘smearing a smear’ from up to 3600 km away) it can just fill in the temp data using ‘nearby’ stations. In the final steps GIStemp adds in the Hadley Sea Surface Temperature. So what is wrong with using cold temps from near the ice to fill in over the ice? Well… How about if you decide to just not use temperatures from places where the water is near the ice?

Remember that you can click on the images to make them larger and easier to read. This is an interesting bit of code. It is in STEP4 and the whole listing (minus the added bit described in a panel below) is in this posting:

http://chiefio.wordpress.com/2009/03/07/gistemp-step4_convhadr2/

In looking at the date stamps in STEP4_5, I noticed that it had changed. Why? I wondered. It isn’t a USHCN related step, and doesn’t depend on USHCN.V1 vs V2 changes (like the rest of the changed files).

GIStemp 3vs4_5 datestamps

GIStemp 3vs4_5 datestamps

Notice that this program is the only one with a 2011 date stamp. Everything else is quite old. So what changed? First off, to get oriented, let’s look at this part from the top of the code. (I downloaded this copy of the sources today, so this is as ‘fresh’ as it gets). In particular, we are looking for the meaning of three variables that are used in a very small fragment of code that was inserted into this program in 2011. They are I, J, and M. Look at the comments about lines 9, 10. They state that I is longitude and J is latitude. (You can also see where they are dimensioned and I gets 1-360 while J gets 1-180 – so 360 degrees of longitude and from -90 to +90 mapped onto 1 to 180). M gets the temperature for that location for any given month.

convert1.HadR2_mod4.f  top

convert1.HadR2_mod4.f top

Why focus on those variables? Because this code was added. It says to mark any cell that is anywhere in longitude ( from 1 to 360 degrees around the globe ) that are in the range near the pole, with ‘bad’, meaning to toss out the data. Just don’t use it. Notice the comment.

GISTemp throwing out data from water near the ice at the pole

GISTemp throwing out data from water near the ice at the pole

Yes, you read that right. The comment says:

“Skip some regions where SST is impacted by nearby ice floats”

That 166 to 180 loop says that anything from 76 N to 90 N is to have the data tossed out.

UPDATE:

From http://data.giss.nasa.gov/pub/gistemp/README.txt

I found a description of which way the arrays run. I’d assumed N to S, but they run S to N, so the above line has been changed from 76 S to 76 N. So it is the N. Pole (where there is more water data anyway) that has the SST data tossed.

The output GRID is rectangular (i:West to East, j:South to North),
where the pole boxes might be of a different latitudinal size than
the other boxes (but North and South pole boxes having the same size).
Examples:
if offI=0, Western edges of i=1 boxes lie on the international date line;
if offI=-.5, centers of i=1 boxes lie on the international date line;
if dlat=180./JM, all boxes have the same latitudinal extent;
if dlat=180./(JM-1), pole boxes are half boxes.

Though notice that there are some selections possible.

So any Arctic water temperatures can only come from places closer to the equator, and the Antarctic itself has very few currently reporting stations; and that data may or may not actually be making it into GIStemp.
END UPDATE.

Perhaps that is why, when we have an all time record ice area in the Antarctic, with record cold temperatures being recorded (by others…), GIStemp thinks it looks like this:

GIStemp polar view, May to October, 2013

GIStemp polar view, May to October, 2013

This is the May to October period, so when it is cold at the South Pole. BTW, I have a series of sea surface graphs from that period, all showing cold water anomalies. Nowhere to be seen in GIStemp. But that is for another day to sort out. IMHO, this image is showing GISTemp having a Jump The Shark moment.

In Conclusion

I’ve got some more on GIStemp, but that will wait for another posting. The big lumps are just that the source code is out of date, the data looks like it is missing a chunk from ‘down south’, and in 2011 the code was modified to avoid water temps from near ice. One can only wonder what possible rationalization could exist for that change. IMHO, it is totally unwarranted by any means. Might as well just start dropping any thermometers in cold places…

Subscribe to feed

Posted in AGW and GIStemp Issues, GISStemp Technical and Source Code | Tagged , , , , | 16 Comments

Data Sources – A List

Over the years I’ve had several postings with ‘source of the data’ link for one thing or another. After a while, you get tired of digging them up again and trying to remember what was in each one. So what I’m going to do here is pretty simple: Put up a set of site links (as sometimes the link to the specific data goes stale when they delete it and replace with a ‘new historical’ set of data-food-product…) along with links to the current detail data links, and a statement or two for some of them saying “what is there”. (Adjusted, un-adjusted, adjusted silently via “QA”, etc.)

I don’t expect that I have everything, nor that I’ve got a description for all of it, so I’ll be putting this posting up “medium rare” and adding to it for a few days. If you have other ideas or sources, please post a comment. I’ll collect the sources into the head posting over time.

FWIW, I’ve been a bit of a packrat over the years. I have a GHCN v1 copy (or two or three…) along with a few GHCN v2 copies. Sometimes I’d download just the ‘adjusted monthly average’, sometimes I’d grab more. It depended on what I was doing, how much disk was free, and what I was thinking at the time (such as “gee, NOAA would never delete the old data, I don’t need a copy of V1 Daily”…) So my personal collection is a bit eclectic. It is also scattered over 1/2 dozen computers on both sides of the country and a few dozen CD / DVD backup disk sets…. But, someday, I hope to collect it all into some kind of a valid “history of the changing history”. Once I have a decent format and layout, I’ll be looking for an archive site where I can put up a few gigabytes for anyone else to use.

Why keep old data copies? For postings / discovery like these comparing version 1 with version 3 data:

http://chiefio.wordpress.com/?s=GHCN+v1

As it stands now, I’m pretty sure I’ve got the GHCN semi-raw daily data (if the description can be believed – it looks like QA flagged, but datum still in place) for some large set of stations along with several sets of USHCN. I don’t have much at all from Hadley, but will likely packrat that too. With SD cards at $1 / GB and DVDs for backups at about 20 ¢ / 4 GB or so, it just isn’t all that expensive to “make a set”. (The hard part for me has been keeping them organized and ‘near me’ ;-) So pointers to other datasets in need of protection / archiving would be appreciated, along with any old archival copies an individual might have that they would like to see a broader audience.

With all that said, here’s the “Draft Alpha Posting” on temperature data sets. Do remember, I’m actively updating this list over the next few days in dribs and drabs, so don’t expect it to be done; expect it to be a construction project.

NOAA / NCDC

The National Climate Data Center – supposed to be the great guardians of the data, and they do have a nice archive, but it is a bit slim on “original raw only” and on version control. More than some others though (like GISTemp that is ‘never the same way twice and no version history kept’). In software development there are dozens of “version control” bits of software that let you roll forward and back to any particular revision while storing things efficiently. (From CVS to RCS to GIT to…) It would seem that folks in “Climate Science” are unfamiliar with these tools, not even having a source code archive to display changes. Oh Well. It is what it is.

Top Page: http://www.ncdc.noaa.gov/cdo-web/

Lists several products and has a nice interface to the data.

Climate Data Online (CDO) provides free access to NCDC’s archive of historical weather and climate data in addition to station history information. These data include quality controlled daily, monthly, seasonal, and yearly measurements of temperature, precipitation, wind, and degree days as well as radar data and 30-year Climate Normals. Customers can also order most of these data as certified hard copies for legal use.

I note in passing the lack of a statement that the original source data ‘unadorned’ (i.e. raw) is also included. It looks as though the QA status is a flag on the daily items and that the reading is still there; but I’ve not proven that via actually looking at the data archive. (It’s a bit large ;-) though you can download individual years if desired and they are smaller). IFF that is true, this is a very good starting point. It would show the actual reading, the QA assessment, and you can decided. Then since it is Daily Min / Max you can calculate your own trends through either, or various daily, weekly, monthly, ‘whatever’, averages. It is what I think needs more exploration sooner. I’ve downloaded a copy of daily data ( many GB and many hours) but not yet unpacked it as I’d already filled up my disk with ‘other stuff’… So it will be a while before I can give it a look and assure it’s what I think it is (and described just above – yes ‘trust but verify’ applies to my own work too, and especially to my speculations).

One clicks on ‘datasets’ to get to the data, and that takes you here:

http://www.ncdc.noaa.gov/cdo-web/datasets

Climate Data Online

Annual Summaries
Daily Summaries
Monthly Summaries
Nexrad Level II
Nexrad Level III
Normals Annual/Seasonal
Normals Daily
Normals Hourly
Normals Monthly
Precipitation 15 Minute
Precipitation Hourly

Legacy Applications

COOP Daily / Summary of Day
Climate Indices
Extremes Monthly
Global Climate Station Summaries
Global Hourly Data
Global Marine Data
Global Summary of the Day
National Solar Radiation Database
Quality Controlled Local Climatological Data
Regional Snowfall Index
Snow Monitoring Daily
Snow Monitoring Monthly

I found the “Daily Summaries” most useful:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/

File:COOPDaily_announcement_042011.doc 	34 KB 	4/20/2011 	12:00:00 AM
File:COOPDaily_announcement_042011.pdf 	123 KB 	4/20/2011 	12:00:00 AM
File:COOPDaily_announcement_042011.rtf 	67 KB 	4/20/2011 	12:00:00 AM
all 		7/20/2014 	7:28:00 AM
by_year 		7/19/2014 	7:05:00 PM
figures 		2/6/2013 	12:00:00 AM
File:ghcnd-countries.txt 	3 KB 	9/20/2013 	12:00:00 AM
File:ghcnd-inventory.txt 	23798 KB 	7/15/2014 	8:40:00 AM
File:ghcnd-states.txt 	2 KB 	5/16/2011 	12:00:00 AM
File:ghcnd-stations.txt 	7709 KB 	7/15/2014 	8:40:00 AM
File:ghcnd-version.txt 	1 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_all.tar.gz 	2521828 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_gsn.tar.gz 	100441 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_hcn.tar.gz 	278461 KB 	7/20/2014 	8:27:00 AM
grid 		7/19/2014 	8:34:00 PM
gsn 		7/20/2014 	5:34:00 AM
hcn 		7/20/2014 	5:35:00 AM
papers 		10/2/2012 	12:00:00 AM
File:readme.txt 	23 KB 	3/18/2014 	5:02:00 PM
File:status.txt 	28 KB 	1/10/2014 	12:00:00 AM

The ghcn_all.tar.gz is a ‘tar’ tape-archive gzip format file. Helps to know / use Linux or Unix to unpack and untar it. As you can see, it is large at 2.5 GB. It is larger once uncompressed. (That is why I’ve not unpacked and inspected it yet…) Under the by_year directory you can get any individual year as a smaller and easier to swallow chunk. Just looking at a listing of it lets you see just how little data there is in the early years compared to recent years. Long term trends are really strongly biased by the very few early readings. We can’t really get quality long term trends out of the recent 50 years data, since there are 60 ish year cycles in weather (climate-change).

File:1763.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1764.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1765.csv.gz 	4 KB 	7/19/2014 	7:04:00 PM
File:1766.csv.gz 	4 KB 	7/19/2014 	7:03:00 PM
File:1767.csv.gz 	4 KB 	7/19/2014 	7:00:00 PM
File:1768.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1769.csv.gz 	4 KB 	7/19/2014 	7:03:00 PM
File:1770.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1771.csv.gz 	4 KB 	7/19/2014 	7:00:00 PM
...
File:1910.csv.gz 	40216 KB 	7/19/2014 	7:05:00 PM
File:1911.csv.gz 	41844 KB 	7/19/2014 	7:02:00 PM
File:1912.csv.gz 	43604 KB 	7/19/2014 	7:02:00 PM
File:1913.csv.gz 	44876 KB 	7/19/2014 	7:02:00 PM
File:1914.csv.gz 	46453 KB 	7/19/2014 	7:00:00 PM
File:1915.csv.gz 	47889 KB 	7/19/2014 	7:00:00 PM
File:1916.csv.gz 	49533 KB 	7/19/2014 	7:05:00 PM
File:1917.csv.gz 	49795 KB 	7/19/2014 	7:01:00 PM
...
File:1942.csv.gz 	77718 KB 	7/19/2014 	7:01:00 PM
File:1943.csv.gz 	78514 KB 	7/19/2014 	7:03:00 PM
File:1944.csv.gz 	80328 KB 	7/19/2014 	7:03:00 PM
File:1945.csv.gz 	82367 KB 	7/19/2014 	7:03:00 PM
File:1946.csv.gz 	82934 KB 	7/19/2014 	7:02:00 PM
File:1947.csv.gz 	84043 KB 	7/19/2014 	7:05:00 PM
File:1948.csv.gz 	100697 KB 	7/19/2014 	7:03:00 PM
File:1949.csv.gz 	114757 KB 	7/19/2014 	7:01:00 PM
File:1950.csv.gz 	117901 KB 	7/19/2014 	7:01:00 PM
...
File:2008.csv.gz 	179740 KB 	7/19/2014 	7:03:00 PM
File:2009.csv.gz 	184554 KB 	7/19/2014 	7:03:00 PM
File:2010.csv.gz 	186549 KB 	7/19/2014 	7:02:00 PM
File:2011.csv.gz 	173750 KB 	7/19/2014 	7:04:00 PM
File:2012.csv.gz 	169154 KB 	7/19/2014 	7:00:00 PM
File:2013.csv.gz 	166776 KB 	7/19/2014 	7:03:00 PM
File:2014.csv.gz 	83161 KB 	7/19/2014 	7:04:00 PM

The degree of ‘instrument change’ over time is just incredible. Trying to do global calorimetry with that is really rather silly. But that is what ‘climate scientists’ do…

The ReadMe file has the interesting notes:

GHCN-D is a dataset that contains daily observations over global land areas. 
Like its monthly counterpart, GHCN-Daily is a composite of climate records from 
numerous sources that were merged together and subjected to a common suite of quality 
assurance reviews.

It is unclear without more digging just what ‘merging’ and ‘quality assurance’ has done to the readings, but a top text reading implies flagging, not replacement or deletion. A “Dig Here!” is to dig into the referenced links and papers to assure / deny that assessment.

This by_year directory contains an alternate form of the GHCN Daily dataset.  In this
directory, the period of record station files are parsed into  
yearly files that contain all available GHCN Daily station data for that year 
plus a time of observation field (where available--primarily for U.S. Cooperative 
Observers).  The obsertation times for U.S. Cooperative Observer data 
come from the station histories archived in NCDC's Multinetwork Metadata System (MMS).  
The by_year files are updated daily to be in sync with updates to the GHCN Daily dataset. 

Just why 1770 needs daily updating is unclear… but the above listing quote does show daily update changes…

There are also pointers to other links and other data archives that need some kind of cross check to figure out if they are different or the same, change over time or not, and just what IS the real historical data…

Further documentation details are provided in the text file ghcn-daily_format.rtf in this 
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/ directory.

Users may find data files located on our ftp server at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all/. 
NOTE: 
There is no observation time contained in period of record station files. 

GHCN Daily data are currently available to ALL users at no charge. 
All users will continue to have access to directories for ftp/ghcn/ and ftp3/3200 & 3210/ data at no charge.

For detailed information on this dataset visit the GHCN Daily web page at http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/

You would think there would be ONE historical set of original RAW data, but no… not found it yet… there looks to be many copies, all with slightly different processing, mixes, histories, updates, “quality assurance”, etc…

Down in the footer of the top page is a comment that they have a ‘legacy’ site with data sets not yet migrated to the new site. Probably worth some archival time…

Data Here: http://www7.ncdc.noaa.gov/CDO/cdo

There is a ‘data set’ option in a drop down on the right. Has some satellite as well as ground data. Could likely spend a month on that just sorting it out, and archiving the bits that are valuable. I’ve not explored it yet, but the implication is that this ‘older’ site / version is going to go away once brought up to date. No idea if that means “copy over intact” to the new site; or “convert and expunge the original past”.

They also have a USHCN set of data, but they describe it as a subset of the GHCN. More details here:

http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

It comes in versions too… they have Version 2.5 up now.

ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/

Has the various versions listed (v1, v2, v2.5) as directories. Also has a ‘daily’ directory.

File:ushcn_01.tar 	130275 KB 	11/7/2002 	12:00:00 AM
File:ushcn_98.tar 	122824 KB 	3/23/2000 	12:00:00 AM

At a few hundred MB, not that large to snag a copy. Though one wonders what happened since 2002 as the newest date stamp.

V1 has only a ‘metadata’ directory and v2 has only ‘monthly’. No idea where to get the old v1 and v2 data now.

Hadley CRU Climate Research Unit

The English archival site. Infamously having said that they “lost the original data”, but that post processing “improved” versions were available, and besides, it was mostly like GHCN that was at NCDC. (And would that be GHCN version 1, 2, or 3? Hmmm??? As the data Langoliers are busy rewriting it…)

At any rate, they may have something of value; but I’d likely use the NCDC daily data instead. (Oh, NCDC also has a long list of ‘data food products’ that they have post processed in various ways. As, IMHO, those are NOT data but are processing products, I’ve not listed them here. From the same top link you can get to their processed, adjusted, homogenized, etc. stuff, I’ve left that out of this posting). For Hadley, since they lost the actual data, all you get is their post-processing data-food-products.

Top Link: http://hadobs.metoffice.com/

It claims that data are available.

Data Here:

CET: http://hadobs.metoffice.com/hadcet/

Interesting graph of it with a Very Nice plunge at the end in the last decade or so…

Hadley graph of Central England Temperature

Hadley graph of Central England Temperature

Gridded Monthly combined land / sea data-food-product: http://hadobs.metoffice.com/hadcrut4/

Has a data download link on the page, but mostly just looks like you can get the post-processed data-food-product as numbers instead of as scary pictures. Don’t see the point, really.

They may have something else interesting there, but I’ve not put the time in to find it. There is a link to “CRUTEM4″ that claims to be actual data (for some degree of ‘data’):

http://hadobs.metoffice.com/crutem4/data/CRUTEM.4.2.0.0_release_notes.html

This page describes updates in CRUTEM4 version CRUTEM.4.2.0.0. Previous versions of CRUTEM4 can be found here. Data for CRUTEM.4.2.0.0 can be found here.
Additions to the CRUTEM4 archive in version CRUTEM.4.2.0.0

The changes listed below refer mainly to additions of mostly national collections of digitized and/or homogenized monthly station series. Several national meteorological agencies now produce/maintain significant subsets of climate series that are homogenized for the purposes of climate studies. In addition, data-rescue types of activities continue and this frequently involves the digitization of paper records which then become publicly available.

The principal subsets of station series processed and merged with CRUTEM (chronological order) are:

Norwegian – homogenized series
Australian (ACORN) – homogenized subset
Brazilian – non-homogenized
Australian remote islands – homogenized
Antarctic (greater) – some QC and infilling
St. Helena – some homogenization adjustment
Bolivian subset – non-homogenized
Southeast Asian Climate Assessment (SACA) – infilling /some new additions
German/Polish – a number of German and a few Polish series – non-homogenized
Ugandan – non-homogenized
USA (USHCNv2.5) – homogenized
Canada – homogenized

In addition, there have been some corrections of errors. These are mostly of a random nature and the corrections have generally been done by manual edits. For a listing of new source codes in use, see below (end).

Largely homogenized and fermented to make data-food-product… something vaguely cheesy, but not real. (In the USA various artificial cheese like products must be labeled ‘cheese food product’ so you will not confuse them with real cheese. I’ve adopted the phrase ‘data food product’ to similarly identify things that are vaguely data like, but not really source data…)

At any rate, it is unclear to me just why I want such a data food product from Hadley.

They claim older versions are here: http://hadobs.metoffice.com/crutem4/data/versions.html

They are all version 4. It is unclear to me what has happened to versions 1, 2, and 3 and where they might be found, though a quick search turned up this link for 3: http://www.metoffice.gov.uk/hadobs/crutem3/data/download.html

Similar searches on Crutem2 give:

http://climateaudit.org/2005/08/17/downloading-cru-data/

So Steve MacIntyre has likely got a set saved somewhere. The article has links in it that currently give 404 not found errors, so the official Version 2 data sets look to have hit the bit bucket. (Maybe I can talk Steve into sending me a set of V2 to archive… or a link to an archive. It would be amusing to do 2 vs 3 vs 4 compares someday…)

Interesting to note that CRU have the v2 links still up. One hopes it is the actual data, but it is what it is. (I’ve downloaded the data, whatever it might be)

http://www.cru.uea.ac.uk/cru/data/tem2/#datdow

Data for Downloading
ERRATUM: before 21st May 2003, the NetCDF versions 
erroneously used a time dimension with units "months 
since (startyear)-1-1" that started from 1. It should
 (and now does) start from 0.

Dataset	
gzipped ASCII	
zipped ASCII	
NetCDF	
gzipped NetCDF	Last updated

CRUTEM2 	
crutem2.dat.gz    3.2mb 	
crutem2.zip       3.2mb 	
crutem2.nc       19.2mb 	
crutem2.nc.gz     1.9mb 	2006-01-18

CRUTEM2v 	
crutem2v.dat.gz   2.5mb 
crutem2v.zip      2.5mb 	
crutem2v.nc       9.6mb 	
crutem2v.nc.gz    1.8mb 	2006-01-18

HadCRUT2 	
hadcrut2.dat.gz   4.5mb 	
hadcrut2.zip      4.5mb 	
hadcrut2.nc       9.3mb 	
hadcrut2.nc.gz    3.5mb 	2006-01-18

HadCRUT2v 	
hadcrut2v.dat.gz  4.2mb 	
hadcrut2v.zip     4.2mb 	
hadcrut2v.nc      8.4mb 	
hadcrut2v.nc.gz   3.3mb 	2006-01-18

Absolute 	
absolute.dat.gz    47kb 	
absolute.zip       47kb 	
absolute.nc        63kb 	
absolute.nc.gz     40kb 	1999-07-13

The CRU crew might have other stuff of interest, but frankly, I'm not very comfortable that it is accurate:

http://www.cru.uea.ac.uk/cru/data/temperature/

The one link for surface data I did follow did a circular run to Hadley / MetOffice / and on…

Someone else can explore all that further, if needed.

NASA GISS GISTemp

IMHO, not really a source of “data”. GISS (Goddard Institute of Space Studies) just takes in the NCDC data, munges it around a little, and calls it data. It isn’t. It takes in an already ‘adjusted’ data-food-product and further manipulates it according to a fixed algorithmic process that is a bit dodgy. They fill in missing bits from other stations up to 1200 km away (doing this three times in successive sections) so any given ‘data item’ might be a complete fabrication partially based on data up to 3600 km away. They also do a somewhat backward Urban Heat Island “correction” that doesn’t correct for urban heat. In the end, they are the data outlier in most results; but for some reason many folks like to look at their stuff.

Data Input From: GHCN, USHCN, and Antarctica. (Links to be added a bit later after I unpack the latest GIStemp code to see if it has changed). It merges and homogenizes this batch and then makes up missing data and prints the results. Not really useful for anything as far as I can see.

Top link: http://data.giss.nasa.gov/gistemp/

Includes links to the data-food-products that it produces.

Data Output Here:

http://data.giss.nasa.gov/gistemp/station_data/

Has a link to GHCN version 2 data on that page, and lets you see individual station data after GIStemp is done with it.

CDIAC

The Carbon Dioxide Information Analysis Center. Guess where their bias lays…

Little referenced, they have their own set of data archives. I’ll be wandering through them to see what I can find. Sometimes it has interesting stuff.

Data Here:

USHCN intro page: http://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.html

Home page: http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html

I squirreled away a copy of the daily-by-state data some time ago, but who knows where it is now.

USHCN data:

http://cdiac.ornl.gov/ftp/ushcn_daily/ (by State)

http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ (in one wad for all).

There are likely other sources for USHCN daily, but I’ve not spent the time to track them down. If you know of any, put a link in comments and I’ll add it.

B.E.S.T. Berkeley

Berkeley Earth Surface Temperature.

Claims to be the best, but isn’t. Not particularly worst either. One of the developers claims to have used the method that skeptics wanted; but it isn’t the method I’ve seen asked for much. It has a slice and dice data splicer at the core of it. Since data splicing is considered a sin in many technical disciplines, how doing more of it, more finely, is a feature; well, that’s beyond me. They also cooked up their own way to store data with their own date format (reasonable since they needed to bring divergent data together) but in a way that is sort of painful to use and a bit of work just to understand. (For example, a date instead of being 30 July 2012 or 300712 or any other is a floating point number. X.YYY where the granularity of the part after the decimal will resolve it to a particular day. I’ll get an accurate description and put it here. Just realize that you can’t take the B.E.S.T. copy of GHCN data and do a straight difference against the other GHCN copy as it has been, um, ‘converted’ in format.

So B.E.S.T. takes in much of the same data as the others, chops, dices, and splices it up a lot. Does more homogenizing and infilling things, then claims it is “data”. Yet another data-food-product, IMHO.

They do have an online archive of their sources (that are largely the same as the above: GHCN / USHCN /… )

http://berkeleyearth.org/about-data-set?/dataset/

The Berkeley Earth Surface Temperature Study has created a preliminary merged data set by combining 1.6 billion temperature reports from 16 preexisting data archives. Whenever possible, we have used raw data rather than previously homogenized or edited data. After eliminating duplicate records, the current archive contains over 39,000 unique stations. This is roughly five times the 7,280 stations found in the Global Historical Climatology Network Monthly data set (GHCN-M) that has served as the focus of many climate studies. The GHCN-M is limited by strict requirements for record length, completeness, and the need for nearly complete reference intervals used to define baselines. We have developed new algorithms that reduce the need to impose these requirements (see methodology), and as such we have intentionally created a more expansive data set.

We performed a series of tests to identify dubious data and merge identical data coming from multiple archives. In general, our process was to flag dubious data rather than simply eliminating it. Flagged values were generally excluded from further analysis, but their content is preserved for future consideration.

So far so good. Start from raw (though some is stated as slightly cooked…) and then combine and clean. It’s then the splice and dice homogenize that gets them, IMHO. “Methodology”.

Data Here: http://berkeleyearth.org/data

Breakpoint Adjusted Monthly Station data

During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred. This breakpoint-adjusted data set provides a collection of adjusted, homogeneous station data that is recommended for users who want to avoid heterogeneities in station temperature data.

What part of “thou shalt not splice data and have no error” is unclear to them? Sigh.

So it’s just a much more homogenized and much more sliced, diced, and spliced data-food-product.

But the good thing is that they put their source data on line, so you can back up ahead of their processing and start over:

http://berkeleyearth.org/source-files

There are many individual links there that I’ve not fully explored. Some of them are already above. Some are not (like the Colonial Era data and the Coop stations). I’d like to archive the lot of them, but time does not allow that at the moment. Perhaps folks could split the job up and each grab a chunk? Assemble again later?

Wood For Trees

Need to put in a good description of these folks. They have nice graphing facilities, and have the data sets behind it. Not yet found out if you can download the whole set of data direct from them.

Top Link: http://woodfortrees.org/

Lists their data sources on the side bar. Does include the UAH satellite data. (At some point I’ll add links to the satellite data, but since they don’t seem to have ‘revisions’ to their history quite so much I’ve not seen it as urgent).

Interactive graph here: http://woodfortrees.org/plot/

Wolfram Alpha

More a calculation site than an archive, yet they clearly have some kind of temperature data archive to be able to compute graphs for folks. Such as this example:

http://www.wolframalpha.com/input/?i=average+temperature+history+new+york

Looking at trends through it a couple of years back, it was clear they did no ‘clean up’ for things like large gaps and odd outlier data; so likely it is (or was) raw not QA checked data. (The data does need some kind of cleaning to be usable, but IMHO Hadley and NCDC go way too far).

Misc and Smaller Sites

There are a lot of ‘bits’ all over. I’ll be expanding this section “for a while”. From individual nations, to specific archives at schools and others. I don’t know much about them. It would be interesting to audit a few of these and compare them with the data-food-products above. If the little guys say “we recorded this data” and the above say “this is it!” and they are different, well… Some examples:

http://academic.udayton.edu/kissock/http/Weather/default.htm

States:

This site contains files of daily average temperatures for 157 U.S. and 167 international cities. The files are updated on a regular basis and contain data from January 1, 1995 to present.

Source data for this site are from the National Climatic Data Center. The data is available for research and non-commercial purposes only.

So it might be possible for some such sites to find older copies held by some other ‘packrat’.

Environment Canada

Has a nice top page that says you can select various kinds of data:

http://climate.weather.gc.ca/index_e.html

If folks post enough links to various national sites, I’ll make a “by nation” section to collect them.

For now, I’m slowly working down the list of sites found by web searches like this one:

https://duckduckgo.com/?q=Temperature+Data+archives

which also found:

http://weather-warehouse.com/

DOWNLOAD a .csv spreadsheet file compatible with programs such as Excel, Access or the Free Open Office Calc, or view in your browser
Exclusive Station Finder Tool helps you find the best data for your needs
Complete National Weather Service archive – over 10,000 stations (far more than most resources on the web)
Unmatched data accuracy
Archived stations some daily data as far back as 1902, most hourly data back to 1972
Meteorologists on hand to assist
Instant access – Get the information you need immediately via your web browser with backup links sent via email

It looks to want to charge you for larger amounts, but might be free for individual station ‘samples’. They claim to have a lot of sources:

Weather Source meteorologists have created perhaps the world’s most comprehensive weather database by unifying multiple governmental and other weather databases together and applying advanced data quality control and correction methods. The resulting “super database” of weather information contains over 4 Billion rows of high quality weather observations. The Weather Warehouse provides users with direct and immediate access to this database. On the Weather Warehouse users have access to the following weather information:

I’ve not dug into it to figure out “what ultimate data source and what processing”, but it looks like all the Excel users our there can get a nice comma separated value CSV spreadsheet for easier processing for some stations.

In Conclusion

OK, that’s it for the moment. I think the GHCN daily is likely the most unmolested of the lot. Coop data and CET are likely pretty useable too. USHCN daily is also likely clean of distortions. Any of the monthly average ‘data’ is not really data. It has been QA checked, filtered, selected, processed; potentially homogenized, adjusted and more. I’d rather start with daily data and work up from there.

If you know of any good archives of old musty data, please add a link!

Subscribe to feed

Posted in AGW and GIStemp Issues, CRUt, NCDC - GHCN Issues | Tagged , , , , , , , , | 4 Comments

A rather useful archive of Individual Station Modification Graphs

It would seem that NCDC have made a nice set of graphs that show the adjustments done on each and every station in the GHCN Global Historic Climate Network temperature history.

A brief look seems to indicate more cooling of the past and warming of the present, via adjustment, than from any asserted “Global Warming” in the actual data. It would take a lot more work, though, to demonstrate that via looking at every station and the net impact on the final ‘warming’.

But for now, there’s quite a set of useful images here:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/products/stnplots/

For each station. So you can just wander around and find things of interest.

I’m going to upload a couple here, for purposes of illustration. But really, this is one giant “Dig Here!” that would benefit from many hands (and eyes) looking at many graphs.

The graphs are in folders with a single digit number. That number is the first digit of the station ID (so also the continent / cluster).

Here’s some info from other locations at that site:

The “ReadMe” file:

Last Updated: 09/29/2010

The following directory:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/products/stnplots/

is comprised of sub-directories (that are named by the first digit of a station
ID) that contain individual station plot files (in “gif” format).

The plot files contain 9 individual graphs, arranged in a 3×3 matrix. The first
column of graphs, contain 2-D colored symbol graphs of the actual monthly data
for the entire period of record for A) the (Q)uality (C)ontrolled (U)nadjusted
(QCU) data, B) the (Q)uality (C)ontrolled (A)djusted (QCA) data, and C) the
differences between QCA and QCU monthly data. The second column of graphs
contain histograms of the monthly data for QCU, QCA, and (QCA-QCU) respectively.
Finally, the third column of graphs depict annual anomalies and their associated
trend line for QCU and QCA, and the differences in the annual anomalies for QCA
and QCU. Detailed axis titles and units are displayed in the title of each
graph.

So you can see that there’s lots of good info here on unadjusted vs adjusted. I find the tend line and the difference graphs the most interesting.

Here’s an example from Tatlayoko Lake, BC:

Adjustment Graph for Tatlayoko BC

Adjustment Graph for Tatlayoko BC

On the right, notice that the original dropping trend line has been turned into a generally flat one. The graph at the bottom right shows that the past was cooled, and the present warmed. Clearly and obviously.

Now, to me, it isn’t so much the warming present and cooling past, as that pretty much every graph has more change from adjustments than it does from actual trend. Those that are not changed generally are so short of data that there isn’t much point. (Though there are graphs that are unchanged).

What’s the net-net of it? Hard to say, but I’d say mostly a “Global Warming” signal that comes out of the adjustments, not out of the data.

They have a paper describing their latest changes here:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical%20Report%20NCDC%20No12-02-3.2.0-29Aug12.pdf

It has some interesting bits buried in it, like their new method finding more step change points to prune out and inducing even more change than the prior version. The “homogenizing” looks to be the magic sauce. It looks similar to the B.E.S.T. splice and dice method of taking slow changes (like aging paint) and keeping that warming in, while taking out the step function when it is repainted to the proper white. Version 3.2.0 finding 1.07 C / Century while Version 3.1.0 has 0.94 C / Century. So we get 0.13 C of added warming from this one update to the code. Now, do that 5 times, you have all of Global Warming. How many updates have there been? Well, since this was from 3.1 to 3.2, I’d wonder about 1.x to 2.x to 3.x… Looks like about a dozen or three to me…

Yes, just a first approximation. But I’d like to know just how many salami slices of warming have been added just this way.

Here is an example station that gets no change:

Syowa GHCN Adjustments

Syowa GHCN Adjustments

So why is it left alone, while others are changed? Who knows…

Again, if it is so important to change the data, dramatically, for other stations; then why is it not just as important for THIS station? Which is the error? Changing the other one, or not changing this one? They BOTH can not be error free decisions…

While Faraday gets the rather high trend there cooled down:

Faraday GHCN Adjustments

Faraday GHCN Adjustments

Mawson station in the same major number cluster gets a bit of warming:

Mawson GHCN Adjustments

Mawson GHCN Adjustments

In a general ‘look over’ it looks to me like the added warming makes up all of the “AGW” signal. It needs a full on analysis / proof to show that. But what gets me more is that there is no rhyme nor reason. Some stations up, some down, some flat. Is the whole thing just an artifact of an algorithmic adjustment gone mad? The average warming signal being the leftovers in the error band of all those seemingly senseless adjustments?

Looking at the raw data for many locations does not show much “warming” at all. This one for example:

Fosston GHCN Adjustments

Fosston GHCN Adjustments

So why do they end up getting a warming trend? And why is the tend from those adjustments so much more than any trend in the actual data?

IMHO, the folks doing the adjusting are in love with their intellectual creations and not bothered to actually look at what it does to the data. (The alternative requiring malice… and “never attribute to malice that which is adequately explained by stupidity”… )

In Conclusion

This takes a whole lot more eyes looking at a whole lot more of these graphs. Sorting them by type of adjustment. Assessing each one for sanity. Calling “BS” on the ones that are just not justified by the known facts. Calling “BS” on the ones with no known facts to justify them. Calling “BS” on the ones where natural cycles and processes have been ironed out in the name of ‘homogeneity”.

But at least the graphs are now produced, and sitting there for everyone to have a look.

Station data and other info is available too. They have a FAQ “Frequently Asked Questions” file here:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/GHCNM-v3.2.0-FAQ.pdf

and it claims to link to other documents that:

global temperature trends?
NCDC Technical Report No. GHCNM‐12‐02 provides a detailed summary of each software modification
and the resulting impacts to global temperatures. This report is available at
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/techreports/Technical Report NCDC No12‐02‐
Distribution.pdf

With software available for inspection:

Is it possible to obtain the computer software code that NCDC uses for making homogeneity
corrections?
Yes. The Pairwise Homogeneity Adjustment algorithm software is available online at
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/software/ .

So plenty to keep a lot of folks busy, if they have the time to dig in and help.

What is very clear is that there is an awful lot of room for fudge in those adjustments, and a lot of room for error that does not show up as error bars, and ought to.

If it is at all like what they do to the USHCN, that is about 1/2 F, it accounts for roughly all of the “Global Warming” signal with nothing left over for nature:

http://www.ncdc.noaa.gov/oa/climate/research/ushcn/ushcn.html

The cumulative effect of all adjustments is approximately a one-half degree Fahrenheit warming in the annual time series over a 50-year period from the 1940′s until the last decade of the century.

USHCN Adjustments

USHCN Adjustments

Subscribe to feed

Posted in AGW Science and Background, NCDC - GHCN Issues | Tagged , , , , | 26 Comments

Broken Feature Hell

Sigh.

I’m stuck once again in Broken Feature Hell.

For decades I’ve been working in I.T. doing computer stuff. The whole time has been marked by features that don’t work. Either as ‘bugs’, or as ‘Real Soon Now’ broken promises, or sometimes as “let the customer do QA” (in the Mico$oft model). Sometimes as Sales Guy Hype Not Shipping Yet…

Once, at Apple, I was looking to buy a load of network gear. One Sales Guy had a great product spec. I was ready to buy several and “Make His Day”. Then I said “OK, just bring in a sample and set it up. Show me that it works.”. He was hesitant. Yes, often that is simply because it is a PITA to do the releases and drag the samples out and get the tech guy assigned and all that time he is not selling. But, I was insistent, there would be no sale without a Demo. A working Demo.

So a couple of weeks later they show up with a Demo Unit. We put it on the table. He hands me the manual. I look over the chassis. Nice lights with all the right labels. “OK, plug it in. Let’s fire it up.” says I. He does a sales pitch. “Um, plug it in. Power. Now please.” says I. He looks a bit pained and pale. “Does it work or not? I want to see it run, right now.” says I….

Well, turns out that this prototype unit had worked maybe a few days ago then failed and they were still trying to make it go right back in the lab, but they brought over the nice empty case for me to look at….

A very dramatic case of Broken Feature Hell, in that the whole product didn’t work, and in spectacular fashion, but such is the world of I.T. and new products.

Another time I spent close to a week trying to track down why a client mail server would only deliver mail from outside the company every 20 ish minutes. It worked fine. Just that the mail coming in and going out only happened to move once every 20 minutes or so. For 20 minutes, mail would go out. Often instantly. But no inbound. THen for about 20 minutes, inbound would flow (often instantly) but outbound would not flow. Eventually I worked out that they had 2 “default gateways” set in the configuration. One inside, one outside.

Now the “default gateway” is also known as the router of

    last

resort. That is, there is exactly ONE “LAST” resort. I argued with the client for the better part of a few hours that this was a Very Bad Idea. Eventually, on the Microsoft “support” web site I found a description of this “feature’. Seems that a Windoz box will rotate between multiple “routers of last resort” on about a 20 minute rotation. They stated about this bug “This behavior is by design”…

So with that in hand, I could demonstrate WHY the mail did what it did, and then was allowed to set up the mail server as it ought to have been. A static route to the inside interface for inside corporate mail, and a default gateway that pointed outbound for ‘everything else’. Mail flowed instantly both ways…

All due to a broken “feature”.

Much of my life has been spent in Broken Feature Hell, and I’ve gotten pretty good at navigating the turf. I can see a broken feature being hyped in a sales call faster than most anyone else; and I can smell the fear when you ask about it…

The Present Land Of Broken Features

So I had this “Bright Idea”. Make a Qemu Bigendian system on a chip so that I could make the bigendian parts of GIStemp work on any old PC. Qemu is a free system emulator. It has SPARC support in it (and I’m pretty sure either a SPARC or SGI MIPS was the base system on which GIStemp was run). Easy Peasy…

The first cut of research showed all the right features present. The docs all said it worked on Windows as well as *nix machines (Linux, Unix, POSix, etc.) The features stated it had several bigendian chips emulated (SPARC, MIPS, PowerPC, ARM with big endian set…). OK, looks reasonable. Probably a few Broken Features, but ought to be livable.

I’ve also gotten very good at navigating Broken Feature Minefields and charting the course through them that works. As long as there are “enough” features that work, you can usually find a set that lines up. Even if it takes some exploration.

But sometimes… sometimes not so much. Lots of potential paths, but then after a day or two down that road comes the precipice or the road block. Sometimes it is a small one and you can build a bridge over or around it. Install a package or find the ‘just so’ flag settings that let it work enough. Sometimes it is a hard stop and you back up a few days and try again.

In this case, the number of features is large, and the number that don’t work is large too. LOTS of paths to explore, almost all of them ending badly. It is Broken Feature Hell. All the work, none of the progress.

Whats What

Do realize that in software development projects a fair amount of Grim Determination is required. An unwillingness to give up. So this Complaint by me does not mean I’m giving up. Not until every path is explored, and marked as a dead end, can you say that Broken Feature Hell has ended in death of the project. For now it is just a bleat from a hot cliff overlooking a dead valley Yet Again.

OK, first up, the Windows support. It’s slim. Yes, I have it running on my laptop. Yes, it works. Yes I have a SPARC emulation running. But several “features” don’t work. First off, the ‘prebuilt’ system images (Debian on a SPARC 32) limit you to Debian Etch. Debian stopped supporting SPARC 32 back in 4.x release land. Now they are at 7.x land. So old code, not updated, no new features. I’m OK with that, I guess. The current Debian wants to support SPARC 64 chips, but doesn’t have it working yet. Yes, I know, free software and free labor making it go, if you want to you can contribute time to it or pay for it. Still, it means that SPARC is not really working all that well.

Then, there’s that small matter of flags to Qemu. The two prebuilts come in windowing and command line. OK, the command line one is fast enough and all I need for GIStemp anyway. It does work. The windowing one works to, after a fashion. As nothing is using any of the graphics hardware, it is God Damn Slow. Painfully so. OK, I could ignore it I suppose. Except…

At the GISTemp source code download page, the tarball is not in FTP land, but in HTML land. No Worries, as wget command gets HTTP files too… but attempt to use it from the command line, while working find on most any other site, give 404 and 403 and other errors on the GIStemp page (depending on what options are set).
http://data.giss.nasa.gov/gistemp/sources/

So is that a fault of the GIStemp web page? Or of wget? Or of THIS particular Etch version of Debian wget? Or… So a ‘quick’ 20 minute launch and set up of the windowing Qemu SPARC and the browser lets me download the GIStemp code (that I showed in the last posting). OK, I’ve got the code. But…

It is in the windowing prebuilt image, not in the command line image.

No Problem, thinks I, there’s launch options to not do windowing from the windowing version… a few launch attempts later and I realize those “features” don’t work. I can launch it, and it may be running without ANY visible console, but it did not launch a command line version. Sigh. Is there a work around past this? Maybe, hike down those four roads a day or two each and report back…

So I can download the code, or I can have a working command line system that is livable, but not both at the same time…

Yes, I can do things like put the source code on a CD, and find out how to mount the CD inside the command line Qemu (IFF that feature works in the Windows port…). There are plenty more paths to explore.

I’m not giving up yet.

But….

Welcome to Borken Feature Hell where you may spend days wandering in the woods only to find yourself back at the last camp. Again.

So far I’ve found a half dozen or so launch flags that look like they don’t work (including the -k en-us flag to let me make the keyboard a US keyboard instead of the GB one it has as shipped; at present I can’t find the ‘pipe’ symbol vertical bar that is essential to *nix command line use…). I also had a ‘failure to configure’ on one instance (apt-get upgrade) but it worked on another. BOTH running from an SD chip. What was different? Not much… and nothing that ought to matter… so a sometimes randomly broken feature…

I now have a FORTRAN compiler installed, but no code yet. And another image has the code, but can’t install the compiler. And a third has the data… and features that ought to let me boot one the same way as the other don’t seem to work, and the interface to peripherals may or may not work and may or may not let me get the code moved; but are painful to set up in any case (all command line options at boot time when that seems to have a lot of broken features already).

And that is how you know you are in Broken Feature Hell.

There’s just enough options left to try that you keep on searching for The One Path. But so many broken that the odds of that path existing are dancing with zero.

At that point, the Grim Determination starts to be a non-feature as you spend way too much time looking for The One True Path and note enough time asking “Is this sane? Is there a better way?”

Often that is asked just before you find the One True Path… so hope springs eternal.

Yet…

In Conclusion

I’ve not listed all the Broken Features I’ve run into. Things like looking at the MIPS and PowerPC emulations and finding them not all that complete either. This was just a sample for the flavor of it.

I’m not sure exactly what path I’m going to take out of this. Likely continue with the emulator for awhile. I’d been wanting to set up a Raspberry Pi general purpose server (but being little endian it can’t run the last part of GIStemp directly). Networking seems robust on the Qemu SPARC. My current “This Time For Sure!”, is an RPi file server with the sources, and a command line Qemu SPARC to unpack and run time. All the parts look like they work. Just ought to be a matter of doing it. I’d guess about 2 days of work (if done straight through).

We’ll see.

Or maybe I’ll just take $100 and buy an old PowerMac and turn it into a Debian box… Forget all the round about stuff and go for straight hardware. Only question being just exactly which of the hundreds of Mac configs actually works well with Debian ;-)

Well, time to get on with the day. Wish me luck as I explore more “features”…

Subscribe to feed

Posted in GISStemp Technical and Source Code, Tech Bits | Tagged , , , , , | 22 Comments

Down the Rabbit Hole with Qemu, GIStemp and more

So sometimes something gets my attention and I’m “down the rabbit hole” for a few days (weeks… months… ye…)

In this case, it was an old problem, back again.

I never got the last steps of GIStemp to run, as all the machines I had at hand were ‘little-endian’ and GIStemp was ‘big-endian’ in the last steps. FORTRAN is a bit unforgiving about endian-ness in data structures. Data written in an unstructured way really gets the fundamental structure of the endian character of the processor. Endian is an oblique reference to Gulliver’s Travels and the wiki mentions it:

“In the discipline of computer architecture, the terms big-endian and little-endian are used to describe two possible ways of laying out bytes in memory. The terms derive from one of the satirical conflicts in the book, in which two religious sects of Lilliputians are divided between those who crack open their soft-boiled eggs from the little end, and those who use the big end.”

Basically, if you write 1234 into a computer, is it stored as 1234 or as 4321? (There is also a byte order endian issue, so it could also be 2143 or 3412 in some odd cases…)

Most of the time for almost everyone this endian issue is completely hidden.

Except…

For programmer geeks like me who keep the world sorted out for people who don’t care, or know, that endian issues are all over the place.

So GIStemp keeps endian issues out of the way until the very end, in Steps_4_5 code.

So I had most of everything that mattered by Step_3, and didn’t get the last bit running, as the machines I had at home were either little-endian PCs (Intel chips) or were Macintosh boxes with Motorola or PowerPC chips (big endian) but running Mac O/S and not something I was willing to blow away to install a Linux port.

So I left it an unfinished bit of the GIStemp port that I never did get the Big Endian bits to run.

Different Time Perception

I sense time differently than other folks.

I know this, but can’t change it. I remember being 3 or so years old and running down a dirt track with two tire tracks and grass in the middle and falling down and realizing that falling hurts skinned knees in just the same way that I remember being a ’20 something’ and dealing with a romantic rejection in just the same way that I remember missing a meeting at work a week ago; and in just the same way that I see what the world will be like a year from now, but can’t stop it from happening. To other people those are very different experiences. To me, it is all the same perception. All of it is “now”.

So I’ll set something aside to get back to it ‘later’, and then a 1/2 decade later pick it up again at just that spot. Only ‘lately’ have I realized that other folks don’t do that. That a decade ago is faded and lost to them. That it doesn’t just ‘pick up again’. So I’ll say things like “I need to do a posting on that”, and it is the same to me if it is tomorrow or a decade later. Other folks, not so much. They see it as not delivering as time ends shortly after the statement… Oh Well…

So, some ‘long time ago’ I looked at GIStemp and noted that it wasn’t using USHCN.V2:

http://chiefio.wordpress.com/2009/11/06/ushcn-v2-gistemp-ghcn-what-will-it-take-to-fix-it/

That clearly shows it was 2009. It is now 2014. I make that 1/2 decade. To me it was just yesterday. Oh Well…

So why dwell on this? Because sometimes exact dates matter. Note that 2009 well.

So around 2009 to 2010 there was a change of USCHN and some GIStemp code changed. This is after I did the GIStemp port. This posting is about what has changed in GIStemp, so that date matters.

So let’s look at some GIStemp date stamps, and along the way look at new ways to run very old code.

Does anybody know what time it is?

I finally, and apparently about a 1/2 decade later, found a solution to my Big Endian machine need. That being a bit of emulator software that lets you make emulated Big Endian machines on Little Endian Intel chip machines. That being Qemu or the “Quick EMUlator” software. http://wiki.qemu.org/Main_Page is the home page.

Qemu is an open source software bit that lets you emulate other hardware. Now, on my laptop, I have a Sun SPARC bigendian emulator running. Basically, I have a Sun SPARC based Sun 5 or Sun 10 running. Now, to make this particularly ironic, I have a SparcStation 5 and 10 in my garage. I bought them for about $5 each when some company in Silicon Valley was going out of business. Yet it is quicker and easier to make the emulator run on my laptop. (Not to mention that by now the lithium batteries are likely dead and the machines have lost their identity as their battery backup ROM evaporates…)

So where to get Qemu? Well… that depends. I got the “for windows” version that is scarce. The “for Linux”
version is available for just about any Linux. But… my laptop and my machine at work are Windows Intel machines. So is there a way to get a SPARCstation running on a WinTel box? Yes.

http://lassauge.free.fr/qemu/

Has a couple of Qemu for Windows releases. I installed the 1.5.3 one. It worked fine. (Details in a future posting, though how many decades is an open perceptual difference ;-)

I now have a SPARC running Debian Linux on my laptop. (Technically, on an SD Chip in a slot in the side of my laptop… but…)

The GIStemp Download

So first I tried using “wget” to get the GIStemp sources (as I was running a non-windowing version of Qemu). No joy. So I went whole hog and started a full on X-Windows based Debian On SPARC emulation. It is slow. Very slow. But livable. Barely. And I got the “current” GIStemp source code downloaded via HTTP / Web Browser to my Virtual Machine.

Some Day I’ll write up the details of how to make this go. For now, the important bit is that I did get the download into a Virtual Machine on my laptop. I did get the compressed archive unpacked. It is ready to configure, compile, and make go. But what surprised me was the time stamp. To me, they are not very new. Most of GIStemp is unchanged. Yes, I need to do a ‘diff’ of the sources and figure out exactly what changed. But at a cursory level, it looks like ‘not much’.

Does this mean that they “double dip” and do the GISS adjustments on top of the GHCN / NOAA / NCDC adjustments? I don’t know yet. That depends on the exact differences in the code. That will come in the future. For now, the date stamps do not show much difference.

So what does it look like? Here are some screen shots.

Images Of GIStemp / Qemu Now

This is the top level picture of Qemu running in a window on my WinTel PC.

Qemu with Linux on emulated SPARC

A screenshot of a SPARC instance of Virtual Machine Linux on a WinTel Laptop

So here is a screen shot of a Big Endian SPARC (emulated) processor running Linux. Note the tar ball of GIStemp sources and the unpacked directory of them.

Note the Date Stamps on the GIStemp sources as unpacked. Not much change in the last 1/2 decade or so…

GIStemp listing with date stamps

GIStemp sources 9 July 2014 listing with date stamp

Step0 is largely unchanged. Step1, the Python step, is also not much changed.

To me, at a first glance, it looks like the “pick up the data” processes in Step0 have changed a little, but the actual “apply processing” in Step1 has not changed much.

Has GIStemp “double dipped” by having NCDC apply adjustments and GIStemp do “the same old same old” on top of them? I don’t know. But the date stamp pattern does not look like much change. Yes, I’ll go through the code “line by line” and see what did change. It just isn’t really looking like they ripped out a lot of stuff on a first glance…

How about Steps 2 and 3?

GIStemp Step 2 and Step 3 date stamps on ls listing

GIStemp Step 2 and Step 3 date stamps on ls listing

Again, it doesn’t look like a lot of change.

For completion, here is the last bit. Steps4_5:

listing of GIStemp Steps 4 and 5 source code

listing of GIStemp Steps 4 and 5 source code

The only notable change looking like it involves HadCrut R2 release changes.

OK, I’m definitely “down the rat hole” as I’m going to do a character by character compare of old and new. But it will take some time. My old copies are on a Mac from about 20 years ago (though it is with me). So it will take a while to get it booted and figure out how to get the old stuff from it to the new Virtual Machine SparcStation 10 emulator. And time is what I don’t have a lot of right now.

My sense of it is that GIStemp will not have really changed much from ‘last look’ and that they double dip the NCDC homogination / adjustments. But close examination will answer it, or end it.

Conclusion

Qemu is your friend. Running full on X-Windows is slow, but livable if needed. Running it as a text only ‘small’ shell Linux is quite fast. The SPARC didn’t have hardware video processing, so running in just a single core and having that do graphics is surprisingly effective; though the slow graphics isn’t everything.

(Someday I hope to make a multi-processor Linux Damn Fast machine, but that is a ways off…)

What has been demonstrated? That a BigEndian solution is in hand, if slow, and that GIStemp has not changed much, per the date stamps. More work needed on that to show exactly what changed, and what it does.

I will be spending the next few week making a GIStemp On A Chip, with GIStemp and data loaded onto a BigEndian system image under Qemu; all on a modest SD Card. If this works out well, then anyone can run GIStemp via a small emulator download.

So that’s what I’m doing now. Yet Another GISTemp Port.

Details as they are available.

I’ll be making a post or two about how to set up a Qemu SPARC 10, how to make it portable (on a chip / thumbdrive) and how to have a portable GIStemp on a chip that runs on most any Windows machine you have laying around.

Subscribe to feed

Posted in AGW GIStemp Specific | Tagged , , | 16 Comments