GIStemp a basic intro.

Welcome to E.M.Smith’s Commentary at

The first thing we’ll be looking at is GIStemp, and in particular, what passes for the ‘README’ file: gistemp.txt

So how to get it?

To download the GISSTEMP source code go to:

and click on the download link.

Unpack the archive, and read gistemp.txt

It ought to look like this:

GISS Temperature Analysis

GHCN = Global Historical Climate Network (NOAA)
USHCN = US Historical Climate Network (NOAA)
SCAR = Scientific Committee on Arctic Research

Basic data set: GHCN –
v2.mean.Z (data file)
v2.temperature.inv.Z (station information file)


For Antarctica: SCAR –

For Hohenpeissenberg –
complete record for this rural station
(thanks to Hans Erren who reported it to GISS on July 16, 2003)

USHCN stations are part of GHCN; but the data are adjusted for various recording
and protocol errors and discontinuities; this set is particularly relevant if
studies of US temperatures are made, whereas the corrections have little impact
on the GLOBAL temperature trend, the US covering less than 2% of the globe.

Step 0 : Merging of sources (
GHCN contains reports from several sources, so there often are multiple records
for the same location. Occasionally, a single record was divided up by NOAA
into several pieces, e.g. if suspicious discontinuities were discovered.

USHCN and SCAR contain single source reports but in different formats/units
and with different or no identification numbers. For USHCN, the table
“ushcn.tbl” gives a translation key, for SCAR we extended the WMO number if it
existed or created a new ID if it did not (2 cases). SCAR stations are treated
as new sources.

Adding SCAR data to GHCN:
The tables were reformatted and the data rescaled to fit the GHCN format;
the new stations were added to the inventory file. The site temperature.html
has not been updated for several years; we found and corrected a few typos
in that file. (Any SCAR data marked “preliminary” are skipped)

Replacing USHCN-unmodified by USHCN-corrected data:
The reports were converted from F to C and reformatted; data marked as being
filled in using interpolation methods were removed. USHCN-IDs were replaced
by the corresponding GHCN-ID. The latest common 10 years for each station
were used to compare corrected and uncorrected data. The offset obtained in
way was subtracted from the corrected USHCN reports to match any new incoming
GHCN reports for that station (GHCN reports are updated monthly; in the past,
USHCN data used to lag by 1-5 years).

Filling in missing data for Hohenpeissenberg:
This is a version of a GHCN report with missing data filled in, so it is used
to fill the gaps of the corresponding GHCN series.

Result: v2.mean_comb

Step 1 : Simplifications, elimination of dubious records, 2 adjustments (
The various sources at a single location are combined into one record, if
possible, using a version of the reference station method. The adjustments
are determined in this case using series of estimated annual means.

Non-overlapping records are viewed as a single record, unless this would
result introducing a discontinuity; in the documented case of St.Helena
the discontinuity is eliminated by adding 1C to the early part.

After noticing an unusual warming trend in Hawaii, closer investigation
showed its origin to be in the Lihue record; it had a discontinuity around
1950 not present in any neighboring station. Based on those data, we added
0.8C to the part before the discontinuity.

Some unphysical looking segments were eliminated after manual inspection of
unusual looking annual mean graphs and comparing them to the corresponding
graphs of all neighboring stations.

Result: Ts.txt

Step 2 : Splitting into zonal sections and homogeneization (
To speed up processing, Ts.txt is converted to a binary file and split
into 6 files, each covering a latitudinal zone of a width of 30 degrees.

The goal of the homogeneization effort is to avoid any impact (warming
or cooling) of the changing environment that some stations experienced
by changing the long term trend of any non-rural station to match the
long term trend of their rural neighbors, while retaining the short term
monthly and annual variations. If no such neighbors exist, the station is
completely dropped, if the rural records are shorter, part of the
non-rural record is dropped.

Result: Ts.GHCN.CL.1-6 – before peri-urban adjustment
Ts.GHCN.CL.PA.1-6 – after peri-urban adjustment

Step 3 : Gridding and computation of zonal means (
A grid of 8000 grid boxes of equal area is used. Time series are changed
to series of anomalies. For each grid box, the stations within that grid
box and also any station within 1200km of the center of that box are
combined using the reference station method.

A similar method is also used to find a series of anomalies for 80 regions
consisting of 100 boxes from the series for those boxes, and again to find
the series for 6 latitudinal zones from those regional series, and finally
to find the hemispheric and global series from the zonal series.

WARNING: It should be noted that the base period for any of these anomalies
is not necessarily the same for each grid box, region, zone. This is
irrelevant when computing trend maps; however, when used to compute
anomalies, we always have to subtract the base period data from the
series of the selected time period to get a consistent anomaly map.

Result: SBBX1880.Ts.GHCN.CL.PA.1200 and tables (GLB.Ts.GHCN.CL.PA.txt,…)

Step 4 : Reformat sea surface temperature anomalies
Sources: HadISST1: 1870-present cmb/sst/oimonth_v2 Reynolds 11/1981-present

For both sources, we compute the anomalies with respect to 1982-1992, use
the Hadley data for the period 1880-11/1981 and Reynolds data for 12/1981-present.
Since these data sets are complete, creating 1982-92 climatologies is simple.
These data are replicated on the 8000-box qual-area grid and stored in the same way
as the surface data to be able to use the same utilities for surface and ocean data.

Areas covered occasionally by sea ice are masked using a time-independent mask.
The Reynolds climatology is included, since it also may be used to find that
mask. Programs are included to show how to regrid these anomaly maps: adds a single or several successive months for the same year
to an existing ocean file SBBX.HadR2; a program to add several years is also

Result: update of SBBX.HadR2

Step 5 : Computation of LOTI zonal means
The same method as in step3 is used, except that for a particular grid box
the anomaly or trend is computed twice, first based on surface data, then
based on ocean data. Depending on the location of the grid box, one or
the other is used with priority given to the surface data, if available.

Result: tables (GLB.Tsho2.GHCN.CL.PA.txt,…)

Final Notes
A program that can read the two basic files SBBX1880.Ts.GHCN.CL.PA.1200 and
SBBX.HadR2 in order to compute anomaly and trend maps etc was available on our
web site for many years and still is.

For a better overview of the structure, the programs and files for the various
steps are put into separate directories with their own input_files,
work_files, to_next_step directories. If used in this way, files created by
step0 and put into the to_next_step directory will have to be manually moved
to the to_next_step directory of the step1. To avoid that, you could
consolidate all sources in a single directory and merge all input_files
directories into a single subdirectory.

The reason to call the first step “Step 0” is historical: For our 1999 paper
“GISS analysis of surface temperature change”, we started with “Step 1”, i.e.
we used GHCN’s v2.mean as our only source for station temperature data. The
USHCN data were used for the 2001 paper “A closer look at United States and
global surface temperature change”, the other parts of “Step 0” were added later.


OK, that’s the basic documentation they give you. It’s a reasonable place to start, though a bit thin. In the GHCN data download there is a file that details the actual layout of the data. You will not find a decent data description of the fields anywhere else. Certainly not in the GIStemp code. So, download the GHCN data set and take a look at the file:

to get your file / data layouts and descriptors.

I also found this site that describes the USHCN data in some detail:

So, take a while to read over that README, to look at the data descriptors in GHCN’s data reading FORTRAN program, and download the data sets. That gets you started.

Please note that STEP4_5 has 2 more data sets that are sucked in. I hesitate to call the “Reynolds” set from NOAA a data set, since it is really just an anomaly map (the raw data having been left behind some time ago) but it is there. So when you get to STEP4, be aware that you are back at a “data download” step before you begin it. The copy of GIStemp I downloaded had this set in it, but they were compresses as a gzip file.

After that things get ‘hairy’. My overall opinion of the GIStemp code is that it is rather poorly written. There are some bits that are much better than others, but it’s overall poorly designed. Examples? Each FORTRAN program is complied in line when used, then deleted. The FORTRAN programs frequently use ‘scratch’ files that are written and deleted from the same place the source code lives. So much for a ‘source archive’…

So take a while and soak this up. I’ll be adding individual summaries for each “STEP” in the code, along with some idea what each step does, what data files it takes in, creates, changes, spits out, and what I think it is intending to do.

In a few months, after I’m done “Deconstructing GIStemp”, I hope to give some idea what a more readable version of the code would do. For now, “Start Here”.


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

8 Responses to GIStemp a basic intro.

  1. Jeff Alberts says:

    I don’t plan on deconstructing the code, but will enjoy following your progress.

  2. E.M.Smith says:

    Welcome Jeff!

    You have the distinction of being my first visitor and first commenter! Glad to have you make that small bit of history.

    Frankly, if all this post does is let folks get a better idea of just what GISS is doing (or not doing) then I’ll be happy.

    Part of my intent is to make it so that other folks don’t have to shove their brains through the GIStemp code sieve just to find the bits that mean something. Hopefully only one person, me, will have to do that so that others are spared the process…

  3. Bob D says:

    Fortran, eh? Nasty stuff, I haven’t touched that in years (a decade actually). We tend to forget how lucky we are nowadays, with .NET etc.

    Good work, though, you’re pretty brave. I’ll also be watching with interest.

  4. E.M.Smith says:

    @Bob D:

    Every computer language has it’s pimples and warts. FORTRAN is no different, really. A bit limited, some ‘oddities’ from it’s math heritage, helps if you think in terms of tape drives (rewind a file? yup!). But I’m OK with it. Yeah, the newer stuff is better for some kinds of work (heck, even FOCUS or RAMIS II from the 1980s as a database report writer on IBM mainframes would work better for most of the early stages of GIStemp). It’s mostly just “get all the data to a similar format and consistent layout” that could be done in 1/4 the code with a database load instead.

    Strangely, in STEP1, the Python does do a database create phase (to make a bunch of math and searching more efficient) then dumps the data out again as a flat character file. Isn’t legacy code fun!

  5. Bob D says:

    Yes, we used to use FORTRAN quite a bit with a non-linear finite element package called ABAQUS. We wrote copious amounts of user subroutines for interpolating boundary conditions such as convection outputs from CFD runs, and also for inserting our own non-linear material properties. Great fun, but then we switched to C when ABAQUS allowed that later on.

    It might be interesting to find out how many developers were involved in producing the GIStemp source code. The oddities might have a lot to do with individual styles over time, being lumped together to create a chaotic system.

  6. Dan Hughes says:

    Hello E. M.,

    I can’t find an e-mail address for you on the site. Will you kindly drop me a line?



  7. E.M.Smith says:

    OK. I can be reached at:

    pub4all ATSIGN aol DOT com

  8. E.M.Smith says:


    I tried to leave a posting on your site, which would have given you my email address, but it didn’t let me. It gave me the error box that “Error – you did not enter a Captcha phrase.”…

    The problem is that no Captcha phrase is presented anywhere to reenter as far as I can tell…

Comments are closed.