GIStemp – Send In The Zones

Clown Picture

Send in the Clowns, There Must Be Clowns

Orginal Image

Send in The Zones, There Must be Zones

Sometimes It’s The Little Things

There are zones, and then there are zones… In GIStemp, it divides the world into zones in STEP2. Later, they display a zone anomaly map on their web site. The two use different zone sizes. This, as they say, is odd…

What does it mean? Is it an “issue”? Is it hiding some kind of odd “cherry pick” or masking an “unusual” choice somewhere? That I do not yet know. For now, all I can say is that it is an “odd thing” and is a big “dig here”.

What is happening is that one zone size is used for STEP2 and another is used in the “anomaly zones” used for the published page. They come from STEP3 that uses different zone sizes for what it does.

The Motivation

On another thread the issue came up of exactly WHERE did GIStemp make a zone cutoff. Was it equal 30 degree bands of the globe or something else? This sent me off to look again at the code that does the zonal breakout.

That discussion was in:

I’d seen the files the GIStemp code produced when it divided the thermometers into zones. 6 of them:

[chiefio@tubularbells work_files]$ ls Ts*
Ts.bin1  Ts.bin2  Ts.bin3  Ts.bin4  Ts.bin5  Ts.bin6
[chiefio@tubularbells work_files]$ 

And I remembered the bit of code that did the divvying up as just doing a straight “math thing” without plug numbers like the 24 and 44 degrees in this zone anomaly chart:

This chart has an odd layout. It gives a column for the whole globe. Then by hemispheres. Then divides the world into 3 zones (90 to 24, 24 to -24, -24 to -90). Finally it has a ‘fine grain’ look of 8 zones. It would lead you to think that they at least had an 8 zone granularity in the underlaying data. But I remembered a maximum of 6 zones.

So I was just a bit perplexed. Into the code I went.

STEP2 – text goes to Binary

Until STEP2 of GIStemp, things tend to be kept as text files. Easier for people to read, but a bit less efficient for the computer. STEP2 changes that. I’m going to put the text of the top level controlling script here; the “Do Script” is what I like to call them, since they all start with “do_”{something}.

Don’t worry, you don’t need to make a lot of sense of it. Just note that it does, in fact, say that it is going to convert the text file (Ts.txt) into a binary (Ts.bin) and cut it into 6 zonal files; and the program that will do this cutting is “split_binary.exe”.

Sidebar on names: I’ve cleaned up the original GIStemp just a tiny bit, so in the original this will not be “one directory up” (meaning that the “../bin/” part will not be there) and they were a bit sloppy about what program was named with a “.exe” or not, so I standardized all of them with that bit for the binary executable. But rest assured, the guts of the program are the same.

The part of the “Do Script” that is of interest is this bit:

echo "converting text to binary file"
  ln -s input_files/v2.inv  .
  rm -f Ts.txt ; ln -s $infile Ts.txt

../bin/text_to_binary.exe $last_year

echo "breaking up Ts.bin into 6 zonal files"


echo "trimming Ts.bin1-6"


for i in 1 2 3 4 5 6
do  mv Ts.bin${i}.trim Ts.${label}.$i

rm -f Ts.bin Ts.txt

And we see that it expects all the input to end up in a set of files named Ts.bin1 through Ts.bin6.

So lets go look at what split_binary does…

Split the Data into Zones in Binary Files: split_binary.f

This is a very short little bit of code, and I’m going to include it here in it’s entirety just so that programmer types don’t have to go looking elsewhere. If you are a non-programmer type, you don’t really need to read it, it is here as documentation and for other eyeballs to tell me if I’m missing some great cosmic subtlety.

This code tries to be a bit “slick” and “tricky”. Not a bad thing. Good programmers will do that kind of thing to make code smaller, more efficient, easier to read sometimes, or harder to read but impressive to their friends other times. At it’s core, it just opens one file and writes records out to “6″ others.

We start with data declarations. We declare some of the buckets we use to hold data (called variables), but not all of them. FORTRAN will automatically declare some by just using them. It would be “best practice” to declare them all, but most folks don’t bother. It opens the file Ts.bin (made by the prior step and deleted after this step per the script fragment above). Then it gets a bit “cute”.

Rather than opening Ts.bin1-6 explicitly, it creates the names in a bucket named fileo inside a “do loop” that goes from 51 to 56. OK, I can live with that. FORTRAN uses the name in the “open” but a number in the “write” statement. So files 51 to 56 are Ts.bin1 to Ts.bin6.

We next have a very short “Go To Loop”. It starts with the label “10″ and ends with the “Go To 10″. Once we enter, we can’t get out until the “read” statement runs out of stuff to read. That is the “end=100″ which will send us to line 100, where we stop. You can ignore the two lines that start with “!!”, those are lines of code deactivated by making them into comments with the “!”. Diagnostic bits of leftovers. But what does this loop do?

It reads in some data from one file, messes about with something named “mu”, then writes it out again into a file with “mu” as the number. By default, “mu” will be an integer. The intent is to have it range from 51 to 56, our Ts.bin1 to Ts.bin6.

So “mu” is based on latitude (lat) that is an integer in 1/10 degrees. We add 899 to it so those southern hemisphere negative -900 to -zero (90S to the Equator) become positive numbers, then we check for negatives and make them zero ( because otherwise a thermometer at the exact south pole could be a -1) and proceed to a bit of math that sorts them into 51 to 56. This math takes the degrees of latitude (as adjusted by the 899) and divides by 300 (to divide it into 30 degree bands).

I built a spread sheet with this math in it to see exactly where the degrees cut off. And discovered an odd thing. Data with a Latitude equal to or greater than 75.1 degrees went into bucket “50″… Then I remembered that FORTRAN does a “truncate” on integer divide while Excel does not. Putting “trunc()” around the division fixed that. The spreadsheet now shows what “zone” a record goes into by latitude. The results are, by 1/10 degrees:

51:  90.0-60.1
52:  60.0 to 30.1
53:  30.0 to 0.1
54:  0 to -29.9 
55:  -30.0 to -59.9
56:  -60.0 to -90.0

So we see that there are six zonal bands of 30 degrees each. OK. (But wait, there’s more… further on.)

[chiefio@tubularbells src]$ cat split_binary.f 
      integer,allocatable,dimension(:) :: idata
      integer info(9)
      character title*80,name*36,fileo*7

      read(1,end=100) info,title

      do n=51,56
      write(fileo,'(a6,i1)') 'Ts.bin',n-50
      write(n) info,title
      end do
      allocate (idata(info(4)))
   10 read(1,end=100) idata,Lat,Lon,ID,iht,name,m1,m2
!!    write(99,*) Lat,Lon,ID
!!    if(ID.eq.603550002) write(98,'(12I6)') idata
      if( mu=0
      write(mu) idata,Lat,Lon,ID,iht,name,m1,m2
      go to 10

  100 stop 0
[chiefio@tubularbells src]$ 

Which just leaves the question of where do those 24 and 44 degree zone limits come from… They come from STEP3.

If you want to see more detail inside STEP2, look in these two pages:

STEP3 – toSBBXgrid.f

This is a snippet of the program run in STEP3. This is where the other zones come from. It does have a commented out line of data declaration (“REAL BANDS”) which is why I remembered those values as being “commented out”. The replacement code is the “REAL SNNEDG”. This looks like an “upgrade” from degrees to radians for the “EDGES” of a zone. Someone more gifted with radians can figure out exactly what that makes the zone cutoffs. I’m willing to assume for now they match the “BANDS” value. I can check it later when I have time.

C**** Sergej's equal area grid
C**** Grid constants for latitude zones
CCCCC REAL BANDS(9)/90.,64.2,44.4,23.6,0.,-23.6,-44.4,-64.2,-90./
C**** We need only the SINEs of the (northern) band edges
      REAL SNNEDG(9)
      DATA SNNEDG /1.,.9,.7,.4,0.,-.4,-.7,-.9,-1./
C     Input data sets - 1: 90N-60N  2: 60N-30N ... 6: 60S-90S 7:short
      DATA INFRST /1,1,2,2,3,4,5,5/
      DATA NUMJ /4,8,12,16,16,12,8,4/
      DATA INLAST /2,2,3,4,5,5,6,6/
C**** Check grid dimensions
         WRITE(6,'(6I9)') NRM1,NRM, ICM*JCM,NCM1, INM1,INM
      END IF
C**** Don't skip any short files if ISHORT=1
      DO 10 NR=1,NRM
      DO 10 IN=1,INM-ISHORT
   10 SKIP(IN,NR)=.TRUE.
C**** Find DDLAT to extend each box R km in north and south direction
C**** Loop over all BOXES (large regions)
      DO 50 J=1,8
C**** Find the sin of the latitudes of the centers in band J
      DO 20 JC=1,JCM

etc. etc. Full listing is here:

At this point I am not sure what the impact will be of swapping data from 6 zones into 8 zones. It will take me a while to sort out what the 6 zones are used for and what actually gets turned into 8 zones of “anomalies”. I suspect that what is happening is that data gets smeared over a zone of size 6, then gets divided back into size 8, so data from further away than you would expect gets sucked into the 1/8 bands. Smear and stretch seems to be the modus operandi of GIStemp, why not do it in zones too…

If you want to see a bit more detail of what STEP3 does, the scripts and code are here:

About these ads

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , . Bookmark the permalink.

5 Responses to GIStemp – Send In The Zones

  1. Peter Dunford says:

    Are the lines
    DATA INFRST 1,1,2,2,3,4,5,5
    DATA INLAST 2,2,3,4,5,5,6,6
    1 = 1,2

    REPLY: I don’t think any of this code is “new” in the sense that I think it was written a long time ago. This particular program is f77 style, while the newer bits are f90 style, so I think this bit was originally done circa the ’80s. I do think you are correct that it is setting up limits of which “new band” gets fed from what “old band”. I have not gone through every line of this code well enough to say for certain, but it looks like it on a first pass bases. New zone one can pick up data from old zones 1 and 2; N.Z. 2 from O.Z. 1,2; NZ 3 – OZ 2,3; NZ 4 – OZ 2,3,4 etc. The unanswered bit is just how far a station can “reach” given all the various “data smearing” techniques used in the different bits of code. This routine sets up limits to the “grid boxes”, but a box can reach beyond the data averaged into it… -ems

  2. Peter Dunford says:

    Oops. Ok, you can’t use the “tab” key in wordpress…

    First and second zones of the 8 combines zones 1 and 2 of the 5.
    Third gets 2 and 3,
    Fourth gets 2 and 4.
    Fifth gets 3 and 5.
    Sixth gets 4 and 5.
    Seventh and eigth get 5 and 6.

    So 1/3/4/6 have the same weighting in the new grid, 2/5 have double their original weighting.

    If a bias is introduced, it depends on the average temperature of each of the bands. If 1/6 is 0, 2/5 is 10 and 3/4 is 20 then the average still works out, no warming is introduced. If 2/5 are 6 and the others stay the same, then there would be a warming bias introduced.

    Am I reading this right?

    REPLY: The numbers are limits, not enumerations, so New Zone 4 gets data from Old Zone 2, 3, and 4. The code then roots around inside that space and weights individual stations in a complex way. From comments further up the code:

    C**** The spatial averaging is done as follows:
    C**** Stations within RCRIT km of the grid point P contribute
    C**** to the mean at P with weight 1.- d/1200, (d = distance
    C**** between station and grid point in km). To remove the station
    C**** bias, station data are shifted before combining them with the
    C**** current mean. The shift is such that the means over the time
    C**** period they have in common remains unchanged (individually
    C**** for each month). If that common period is less than 20(NCRIT)
    C**** years, the station is disregarded. To decrease that chance,
    C**** stations are combined successively in order of the length of
    C**** their time record. A final shift then reverses the mean shift
    C**** OR (to get anomalies) causes the 1951-1980 mean to become
    C**** zero for each month.

    I was skipping that level of “in the weeds” stuff for this “first look”, but I guess it needs to be mentioned here… So it looks like the code roots around in specific old bands and takes individual stations and then molests them via a strange weighting method. Oh, and RCRIT is 1200 km. So it looks to me like a station can be 1200 km away from the edge of the “New Zone” in an “Old Zone” and still influence it, with a weighting. The question is: How perfect is it at not introducing bias in the process? I have no answer, yet. -ems

  3. Peter Dunford says:

    That should be 11, not 6.

  4. E.M.Smith says:


    The circumference of the earth is about 40,000 km meridional, so 1/2 that would be 20,000 km. Divide that into 8 bands, you get 2500 km each. The “reach” is 1200 km. Almost exactly 1/2 a “band” for the 8 zones section ( I suppose setting it to 1250 would have been too blatant). In both the 6 and 8 zone sections, the equator ends up right on the mid point (not a lot of “benefit” from “reaching” at the equator).

    At the mid point of a hemisphere, the center point of a type 6 band (bands 2 and 5) is exactly aligned with the 2/3 and 6/7 transitions in the type 8 bands. So you get to “reach” 1200 km into the 1666 km that is a “half a type 6 band” from that mid point. This means that, for example, Type 8 Band 2, just outside the polar band, can reach to within 466 km of the equatorial Type 6 band 3. That is, it can reach to within 466 km of the most equator ward edge of Type 6 band 2, within 466 km of the Type 6 band 3 next to the equator.

    Empty boxes reach to full boxes for values. The more polar you go, the fewer thermometers, the more boxes filled in from more central stations. Also, polar and near polar cold band deletions become more important. And since there is always more surface area per distance going equatorward, you will get more warm thermometers in an “equal distance” (but not latitude proportional) “reach”. Geography itself will build in a warming bias.

    (At the limit case, a box or grid exactly at the pole can only reach 1200 km toward warming, in any and all directions. At the equator, a 1200 km “reach” is only toward “cooler”; though at the equator the effect is diminished because the sun oscillates from one side of the equator to the other seasonally and not much changes. At the pole, it’s “double or nothing”. Basically, the equator does not change as much with distance and seasons as do the poles. Barrow Alaska to Anchorage, big deal; Ecuador to Panama, not so much… )

    Hot zones will be spread to cover the empty poleward zones.

    This just stinks.

  5. Ellie in Belfast says:

    A commenter on WUWT highlighted “Points of View” on BBC Radio4 this evening by Clive James. It is basically about scepticism and concensus, but it is actually a brilliant allegory – the golf ball potato crisp (=chip), not only about global warming, but also about GIStemp. This aspect will be lost on many people, but perhaps not for readers of this site.

    “Apparently a golf ball yields precisely 18 slices. All 18 slices of the golf ball, along with the thousands of slices of potato, go into the cooking process and emerge at the other end as something hard to distinguish, visually, from crisps.”

    For those who can’t get it on BBC iPlayer (outside UK) it is reproduced here in full:

Comments are closed.