GIStemp STEP0 Input Files

Where GIStemp Gets Its Data

Where GIStemp Gets Its Data

OK, so we have looked at where to get the data and code, a high level overview and the first bit of input file sorting code. Here we will take a peek at the data format for the input files to STEP0. I will present the sample data in fixed format, so some of the actual data will not be visible off the right hand edge of the screen. These are just samples, so truncation on the right is OK. Just realize that when the text says there are 12 months data, if only 9 fill the screen, the other 3 are ‘off the right edge’. I just thought trying to read the wrapped lines was not worth it for this ‘peek’ at the data.

The antarc[1-3].txt files are the actual temperature data from the antarctic. After a descriptive header, each year is presented with 12 monthly average temperature fields:

Get data index at http://www.antarctica.ac.uk/met/READER/surface/stationpt.html
Title: *NONE*
Get page http://www.antarctica.ac.uk/met/READER/surface/Adelaide.All.temperature.txt
Adelaide temperature
1961       -    -2.4       -       -       -       -       -   -16.5   -14.8   -11.0    -5.5    -2.7
1962    -2.0    -4.4       -       -    -3.2    -8.8   -10.3    -9.3    -9.3    -3.6    -3.2    -0.4
1963     0.0     0.3    -3.6    -7.4    -5.6    -7.8   -10.3   -19.0   -10.5    -8.8    -2.5     0.8
1964     1.0     0.0    -2.4    -7.7    -9.4   -11.7    -6.0   -20.1    -9.7    -3.2    -1.1     0.0
1965     0.9     0.7     1.1    -3.3    -3.9    -7.4   -13.0   -12.2    -5.7    -8.9    -2.1    -0.8

While the antarc*.list files contain station information (Station ID, Name, latitude, longetude, flag, setname).

50194998000 MACQUARIE ISLAND               -54.50  158.93 -999 antarc2
50793945000 CAMPBELL ISLAND                -52.55  169.15 -999 antarc2

Parts of the code in this step glue these together to create data files with a station ID and the data, rather than the station name in a header block of text.

The file vw.inv is a similar station information file with several added fields that look like the data described for GHCN earlier. The actual records are fixed format like:

10160355000 SKIKDA                          36.93    6.95    7   18U  107HIxxCO 1x-9WARM DECIDUOUS  C   49
10160360000 ANNABA                          36.83    7.82    4   33U  256FLxxCO 1A 7WARM CROPS      C   12
10160390000 DAR-EL-BEIDA                    36.72    3.25   25   34U 1365FLxxCO10A 6WARM CROPS      C   25

But this example will wrap, and with the spaces compressed, so you can see more of what the data actually look like:

10160355000 SKIKDA 36.93 6.95 7 18U 107HIxxCO 1x-9WARM DECIDUOUS C 49
10160360000 ANNABA 36.83 7.82 4 33U 256FLxxCO 1A 7WARM CROPS C 12
10160390000 DAR-EL-BEIDA 36.72 3.25 25 34U 1365FLxxCO10A 6WARM CROPS C 25
10160395001 FT. NATIONAL 36.52 4.18 942 805R -9MVDEno-9x-9WARM CROPS A 0
10160400001 CAP CARBON 36.80 5.10 230 28R -9HIxxCO 1x-9WATER A 13
10160402000 BEJAIA 36.72 5.07 2 121U 90HIxxCO 1A 3WATER B 17

The file t_hohenpeissenberg_200306.txt_as_received_July17_2003 is a set of temperature data from a single location that was manually obtained by GISS and is in a unique format. It is used to replace data for that location that has gaps in it. The first 3 lines are:

YEAR    JAN     FEB     MAR     APR     MAY     JUN     JUL     AUG     SEP     OCT     NOV     DEC     D-J-F   M-A-M     J-J-A   S-O-N   ANN
1781    -1.6    -0.9    2.4     8.7     12.2    14.2    15.2    16.7    12.7    4.7     1.9     1.5     -1.02   7.77      15.37   6.43    7.14
1782    -0.8    -5.5    0       3.8     9.3     15.5    17      14.4    10.9    3.9     -2.5    -2      -1.60   4.37      15.63   4.10    5.62

The *.tbl files: sumofday.tbl and ushcn.tbl are very similar in format and seem to be a list of an odd station ID and regular station ID followed by a single digit flag. Active flag? While mcdw.tbl has what looks like two regular station IDs and a flag byte.

head sumofday.tbl 
      70701 14661967000  1
      43323 21047696001  0
      43324 21047764001  0

head ushcn.tbl 
      84570 42572201000  0
      87020 42572202002  0
      82850 42572202004  0

head mcdw.tbl 
  103060355 10160355000  2
  103060360 10160360000  3
  103060390 10160390000  5
  103060402 10160402000  3

Ts.discont.RS.alter.IN looks like a “station ID, flag, year, data” and contains the single line:

425911650000  1 1950 0.84

while Ts.strange.RSU.list.IN has lines of the form:

122637720000 LAMU                      lat,lon   -2.3   40.8 omit: 1914/07
148628400000 MALAKAL                   lat,lon    9.6   31.7 omit: 1990/08
207425150003 CHERRAPUNJI               lat,lon   25.3   91.7 omit: 1991-1993
115624640010 HURGHADA                  lat,lon   27.3   33.8 omit: 0-9999

Which look like station ID, name, several data fields, gaps. The data fields appear to be a data description, latitude and longitude with negative signs to designate hemisphere (southern and west of Greenwich), and a directive to omit some data. Though it is unclear if these are saying the data are missing in the gap or the gap ought to be created due to the data ‘looking strange’ (is omit a verb or a statement of fact?). I’m hopeful the code will clarify this as I proceed through it. The omit dates appear to be in two formats: yr/mo and yr-yr.

The directory _old is supposed to contain the unsorted versions of the antarctic files, but the copies in them are slightly different sizes from the post processed version, so I presume some other transformations happen as well.


ls -l _old

  196085 Sep 10  2007 antarc1.txt
  139004 Feb 28  2008 antarc3.txt

ls -l antarc*txt

  212171 Aug 11 05:52 antarc1.txt
  214373 Jun  9  2008 antarc2.txt
  142484 Aug 11 05:52 antarc3.txt

The code indicates several other file names that are expected as input to some steps or output from others. I will document these as I figure out what they are in the individual STEPx postings.

One peek ahead:

In looking at STEP4_5 (chasing down the claim that GIStemp uses “the satellite data”) we see two data sets used. These are from Hadley and NOAA.

http://www.hadobs.org                     HadISST1:   1870-present
ftp.emc.ncep.noaa.gov cmb/sst/oimonth_v2  Reynolds 11/1981-present

The file oimonth_v2 came as a gzip file with no directions to gunzip it in the GIStemp download. So when you get to that step, gunzip the .gz version. Also, I found an ascii version of the file is also available from NOAA so I was able to take a peek inside and see something other than binary hash. It’s a single matrix of anomaly data points. Latitude, longitude, anomally by month for 12 months.

Some folks have asserted this means GIStemp uses “the satellite data”. I would assert it means GIStemp uses a satellite and surface station derived anomaly map. At no time have we seen the actual satellite data.

It looks like:

 2009    1    1 2009    1   31   311115100345
 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18
 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18
 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18 -18

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

3 Responses to GIStemp STEP0 Input Files

  1. Peter O'Neill says:

    “The *.tbl files: sumofday.tbl and ushcn.tbl are very similar in format and seem to be a list of an odd station ID and regular station ID followed by a single digit flag. Active flag? While mcdw.tbl has what looks like two regular station IDs and a flag byte.”

    In STEP0 only ushcn.tbl is used, to translate those “odd” station IDs to “regular” station IDs, and that “single digit flag” is not used.

    In STEP1 the three *.tbl files are read in get_sources(), found in v2_to_bdb.py, and that “single digit flag” turns out to be an offset which is combined with the station ID to select the appropriate raw data set from among multiple data sets for that station.

    i.e. mcdw.tbl
    103060355 10160355000 2 -> 101603550002
    103060360 10160360000 3 -> 101603600003
    103060390 10160390000 5 -> 101603900005
    103060402 10160402000 3 -> 101604020003

    Note that these offsets are all zero for ushcn.tbl

    See the comment I will now add to the gistemp-step1_v2_to_bdb page for a further comment on this in the context of the code shown there.

  2. E.M.Smith says:

    Peter,

    Thank you so much for your comment.

    I was beginning to think that I was the only person who was looking at any of this and that maybe I was on a fools errand wasting my time with it. You have given me reason to continue!

    Thanks!

  3. We worked through quite a bit of this at Climate Audit in 2007 (which is how Hansen got forced to put the code online.) See older entries in http://www.climateaudit.org/?cat=54&paged=2. Regards, Steve McIntyre

Comments are closed.