GIStemp – a cleaner approach

I have a running version of GIStemp (at least up to STEP3… STEP4_5 is awaiting testing after I get the bigendian data files converted to littleendian).

In this posting, I’m going to provide a general overview of some of the technical changes / clean up that I’ve done.

First is the file structure where the source code and executables live. Regular GIStemp has lines strewn through the scripts that do a “Compile foo, run foo, remove foo” that is at best annoying and at worst it could produce non-reproducible output or even wipe out files if accidentally named to something that was going to be overwritten. Standard practice has source files in a directory named “src” with executables in a directory named “bin”. So that’s what I’ve done. Here is an “ls” listing of file names showing where I moved things:

[chiefio@tubularbells GIStemp]$ ls
analysis  Archive  bin  doc  src  STEP0  STEP1  STEP2  STEP3  STEP4_5
[chiefio@tubularbells GIStemp]$ ls src

annzon.f                   dump_old.f    PApars.f          toSBBXgrid.f
antarc_comb.f              flags.f       SBBXotoBX.f       tr2.f
cmb2.ushcn.v2.f            ftp           sorts.f           trim_binary.f
cmb.hohenp.v2.f            hohp_to_v2.f  split_binary.f    trimSBBX.f
convert1.HadR2_mod4.f      invnt.f       t2fit.f           USHCN2v2.f
convert.HadR2_mod4.upto15full_yrs.f  
Makefile      text_to_binary.f  x.ftp
dif.ushcn.ghcn.f           padjust.f     toANNanom.f       zonav.f

[chiefio@tubularbells GIStemp]$ ls bin

alter_discont.py             dif.ushcn.ghcn.exe  sorts.exe
annzon.exe                   drop_strange.py     split_binary.exe
antarc_comb.exe              dump_old.exe        text_to_binary.exe
antarc_comb.sh               flags.exe           toANNanom.exe
antarc_to_v2.sh              get_USHCN.sh        toANNanom.sh
bdb_to_text.py               hohp_to_v2.exe      toSBBXgrid.exe
checkinput                   invnt.exe           trim_binary.exe
cmb2.ushcn.v2.exe            listStats.py        trimSBBX.exe
cmb.hohenp.v2.exe            listStats.pyc       trimSBBX.sh
comb_pieces.py               padjust.exe         USHCN2v2.exe
comb_records.py              padjust.sh          v2_to_bdb.py
comb_records.pyc             PApars.exe          zonav.exe
convert1.HadR2_mod4.exe      PApars.sh           zonav.sh
convert.HadR2_mod4.upto15full_yrs.exe  
SBBXotoBX.exe

You will also notice that I’ve regularlized the names a bit. ALL shell scripts end with “.sh” and all compiled FORTRAN executables with “.exe” while Python scripts are either .py or .pyc (though the original structure of STEP1 is left intact. The EXTENSIONS directory contains some C libraries that are installed. I left this “as is” simply to avoid the risk of breaking a complicated bit that seems to work and is only done once at install time.)

With this change comes the need to have a more formal “build” process. This is done with a fairly simple make file:


[chiefio@tubularbells src]$ pwd
/gnuit/GIStemp/src
[chiefio@tubularbells src]$ cat Makefile 

FC=/usr/local/bin/g95
FC95=/usr/local/bin/g95
FC77=/usr/bin/f77
SRCDIR=.
BINDIR=../bin

all:            inputsort step0 step1 step2 step3 step4_5

step0 :         antarc cmb2 cmbho dumpold hotov2 u2v2 
                    dif.ushcn.ghcn

step1:
                #echo "Still a custom makefile inside STEP1"

step2:          flags padjust toANNanom PApars split_binary 
                    text_to_binary trim_binary invnt

step3:          trimSBBX toSBBXgrid zonav annzon

step4_5:        convert1.HadR2_mod4 
                     convert.HadR2_mod4.upto15full_yrs SBBXotoBX 

inputsort :     sorts.f 
$(FC95)  -o $(BINDIR)/sorts.exe $(SRCDIR)/sorts.f 

antarc :        antarc_comb.f
$(FC77) -o $(BINDIR)/antarc_comb.exe $(SRCDIR)/antarc_comb.f 

cmb2:           cmb2.ushcn.v2.f
$(FC) -o   $(BINDIR)/cmb2.ushcn.v2.exe $(SRCDIR)/cmb2.ushcn.v2.f

cmbho:          cmb.hohenp.v2.f
$(FC) -o   $(BINDIR)/cmb.hohenp.v2.exe $(SRCDIR)/cmb.hohenp.v2.f

dumpold:        dump_old.f
$(FC) -o   $(BINDIR)/dump_old.exe $(SRCDIR)/dump_old.f

hotov2:         hohp_to_v2.f
$(FC77) -o $(BINDIR)/hohp_to_v2.exe $(SRCDIR)/hohp_to_v2.f

u2v2:           USHCN2v2.f
$(FC77) -o   $(BINDIR)/USHCN2v2.exe $(SRCDIR)/USHCN2v2.f 
#"USHCN2v2.f is compiler senstive at run time.  1 digit rounding variation

dif.ushcn.ghcn: dif.ushcn.ghcn.f
$(FC95) -o $(BINDIR)/dif.ushcn.ghcn.exe $(SRCDIR)/dif.ushcn.ghcn.f

flags :         flags.f 
$(FC) -o   $(BINDIR)/flags.exe $(SRCDIR)/flags.f 

padjust :       padjust.f 
$(FC) -o   $(BINDIR)/padjust.exe $(SRCDIR)/padjust.f 

text_to_binary : text_to_binary.f 
$(FC95) -o $(BINDIR)/text_to_binary.exe $(SRCDIR)/text_to_binary.f 

toANNanom :     toANNanom.f 
$(FC77) -o $(BINDIR)/toANNanom.exe $(SRCDIR)/toANNanom.f 

trim_binary :   trim_binary.f 
$(FC95) -o $(BINDIR)/trim_binary.exe $(SRCDIR)/trim_binary.f 

PApars :        PApars.f  
$(FC) -o   $(BINDIR)/PApars.exe $(SRCDIR)/PApars.f $(SRCDIR)/tr2.f 
$(SRCDIR)/t2fit.f

invnt :         invnt.f 
$(FC77) -o $(BINDIR)/invnt.exe $(SRCDIR)/invnt.f 

split_binary : split_binary.f 
$(FC95) -o $(BINDIR)/split_binary.exe $(SRCDIR)/split_binary.f 

trimSBBX :      trimSBBX.f
$(FC) -o   $(BINDIR)/trimSBBX.exe $(SRCDIR)/trimSBBX.f

annzon :        annzon.f 
$(FC95) -o $(BINDIR)/annzon.exe $(SRCDIR)/annzon.f

toSBBXgrid :    toSBBXgrid.f
$(FC95) -o $(BINDIR)/toSBBXgrid.exe $(SRCDIR)/toSBBXgrid.f

zonav :         zonav.f
$(FC95) -o $(BINDIR)/zonav.exe $(SRCDIR)/zonav.f

SBBXotoBX :     SBBXotoBX.f
$(FC95) -ftrace=full -o $(BINDIR)/SBBXotoBX.exe $(SRCDIR)/SBBXotoBX.f

convert1.HadR2_mod4 : convert1.HadR2_mod4.f
$(FC77) -o $(BINDIR)/convert1.HadR2_mod4.exe $(SRCDIR)/convert1.HadR2_
mod4.f

convert.HadR2_mod4.upto15full_yrs : convert.HadR2_mod4.upto15full_yrs.f
$(FC77) -o $(BINDIR)/convert.HadR2_mod4.upto15full_yrs.exe $(SRCDIR)/
convert.HadR2_mod4.upto15full_yrs.f

clean :         cleanin clean1 clean2 clean3 clean4_5

cleanin :
                rm $(BINDIR)/sorts.exe
clean1 :
                rm  $(BINDIR)/antarc_comb.exe $(BINDIR)/cmb2.ushcn.v2.exe 
                rm  $(BINDIR)/cmb.hohenp.v2.exe 
                rm  $(BINDIR)/dump_old.exe    $(BINDIR)/hohp_to_v2.exe 
                rm   $(BINDIR)/USHCN2v2.exe  $(BINDIR)/dif.ushcn.ghcn.exe 
clean2 :
                rm $(BINDIR)/flags.exe        $(BINDIR)/padjust.exe
                rm $(BINDIR)/text_to_binary.exe   $(BINDIR)/toANNanom.exe 
                rm $(BINDIR)/trim_binary.exe  $(BINDIR)/PApars.exe
                rm $(BINDIR)/invnt.exe          $(BINDIR)/split_binary.exe 
clean3 :
                rm $(BINDIR)/annzon.exe       $(BINDIR)/toSBBXgrid.exe
                rm $(BINDIR)/trimSBBX.exe       $(BINDIR)/zonav.exe
clean4_5:
                rm $(BINDIR)/SBBXotoBX.exe    
                rm $(BINDIR)/convert1.HadR2_mod4.exe
                rm $(BINDIR)/convert.HadR2_mod4.upto15full_yrs.exe
getdata:
#"Get the data from the sources using ../ftp{scripts} or by hand"
#"antarc1.txt was downloaded from 
http://www.antarctica.ac.uk/met/READER/surface/stationpt.html"
#"antarc2.txt was downloaded from 
http://www.antarctica.ac.uk/met/READER/temperature.html"
#"antarc3.txt was downloaded from http://www.antarctica.ac.uk/met/READER/aws/awspt.html   "    
#"some typos in antarc2.txt were manually corrected, whatever that means"
#"Station information files were manually created combining information from"
#"the above files and GHCN's v2.temperature.inv"
#"You can get SBBX.HadR1 from: 
ftp://ftp.giss.nasa.gov/pub/gistemp/SBBX.HadR2""
#"The oiv2.monthly files are at: 
ftp://ftp.emc.ncep.noaa.gov/cmb/sst/oimonth_v2" 

That’s the bulk of it. There are minor changes made in some of the scripts to find executables in “../bin” and to remove compilation. It’s also nice to have only one copy of the FORTRAN source for the programs shared between STEP3 and STEP4_5.

I’m using the “g95” compiler for some code, and the LINUX Red Hat 7.2 release of f77 for some other code. If a program requires f77 to compile, it is used. If a program requires g95, that is used. If both work, then I use the “default” “FC” parameter in the Makefile (presently set to g95). The one exception is USHCN2v2.f that compiles with both, but changes the temperature field in about 2% of the records in the 1/10 C field by one digit. It looks like the default type conversion behaviour might be different between the two compilers and the F to C code is not explicit in when to do the INT to REAL conversion (a bit of sloppiness that I’ll fix a bit later…). At any rate, I’ve chosen to leave it on f77 until I characterize the behaviour better.

If there is any interest in what changes I made to the particular programs and scripts to make them run, I’ll put up “diffs” of anything folks want. (Or I’ll put a runnable tarball on any public FTP server you have, just give me access.)

Finally, some minor points. I gathered a bunch of the misc. “readme” files in the code and put them together in a “doc” directory:

[chiefio@tubularbells GIStemp]$ ls doc
gistemp.txt  preliminary_manual_steps.txt  README.STEP0.input_files
index.html   PYTHON_README.txt             step0_README.txt

All that is left in the STEP directories tends to be the “do_” script, the input, output, and work directories, and any scratch files the code still leaves laying about:

[chiefio@tubularbells GIStemp]$ pwd
/gnuit/GIStemp
[chiefio@tubularbells GIStemp]$ ls STEP*
STEP0:
do_comb_step0.sh  input_files  to_next_step  work_files

STEP1:
do_comb_step1.sh  input_files   work_files
EXTENSIONS        to_next_step  work_files.old

STEP2:
do_comb_step2.sh  input_files  to_next_step  work_files

STEP3:
do_comb_step3.sh  results  to_next_step  work_files

STEP4_5:
annzon.Ts.ho2.GHCN.CL.PA.log  input_files
BX.Ts.ho2.GHCN.CL.PA.1200     results
do_comb_step4.sh              SBBXotoBX.log
do_comb_step5.sh              work_files
do.mult_year.TocnHR2.upd      zonav.Ts.ho2.GHCN.CL.PA.log

This is much safer, since a rogue program can’t wipe out your source code without really trying to get into the ../src directory.

In theory, one could combine the STEP directories into a single directory. I may do that at a future point, but for right now it’s convenient to have each step write files in it’s own space (where I can easily see what’s happening and compare different versions / test the effect to code changes, a bit more easily).

The top level “analysis” directory is where I’m building some minor tools that let me see the data and monitor just what GIStemp does to it. The “Archive” is self explanatory. Anything I change gets archived there for easy “roll back” and / or comparison. I could have used a source code control product, but this is in some ways simpler and more intuitive for non-UNIX folks to follow.

Also notice the src/ftp directory. That is where I’m building a tool to go and fetch all the needed data files from the various FTP sites that have them scattered about. Right now it’s a “work in progress” with some of it ftp commands and some wget commands (and it’s not clear that I’ve got commands for all of the files … yet. When I’m pretty sure it’s complete, I’ll do a “scratch all the data, reload it” test. Then publish the script(s) here.

Another week or two of clean up (and building a bigendian / littleendian file converter – the g95 compiler claims to support the convert flag, but doesn’t; at least not in the release I have up) and I’ll be more or less ready to call it “cleaned up” and ready for characterization, testing, validation.

Oh, and any of my code is available for anyone to use as long as they put an attribution in it of the form:

#Written by E.M.Smith, July, 2009
#Available under copyleft or GNU public license

Or similar language that leaves commercial rights with me, but anyone can do non-commercial things with it as long as my authorship is acknowledge.

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

10 Responses to GIStemp – a cleaner approach

  1. pyromancer76 says:

    I enjoy reading the code; wish I understood more. I have been hoping for a post of your comment on WUWT re GIStemp reporting/making/working-with-in-their-computer-model “very warm” winters. What do you think that is about? The post out of UC Davis also touted warm winters with woe to California fruits and nuts. Thanks for your correction re the Army Corps of Engineers.

  2. E.M.Smith says:

    I duplicated the WUWT posting under the top level GIStemp tab up at the top as an “update”.

    One of the next things on my “to do” list is to better characterize what happens to the data as it flows through the steps (i.e. the winter warming, the “gistemp raises the temps in the data in aggregate”, etc. In fact, I’d have had that up today, but for discovering that USHCN2v2.f has compiler dependent behaviour.

    That sent me down the path of compiling, one at a time, the programs that compile on both f77 and g95, running GIStemp, and comparing the output. Until I finally found which program was the issue and that it was probably their sloppy conversion from degrees F to C (that has an implicit INTEGER to REAL conversion rather than doing it in a controlled specified way).

    All that will feed into the “Data_GO_Round” posting as we walk through GIStemp, seeing where the data go and what gets done to it along the way.

    It’s just slow to build all the tools to do this (and slower to have cleaned up GIStemp to where I can run it and conduct the testing / evaluation…)

    So I’m now in a position to do that stuff I promised so long ago: A “people oriented” walk through GIStemp. (But knowing that I’ve got the details right from slogging through the technical level…)

    BTW, I didn’t think it was a correction per The Corp, so much as an amplification. You called it not being up to “code”, and that’s correct to the extent you want a CAT 5 rated system and are not getting one. I just put specific numbers on the “spec” (Cat 3 vs 5) and firmly pointed the blame where it belongs, with the folks who controlled the money and the decisions.

    Finally, the warm winters thing:

    I REALLY hope folks pick up on that. It is lethal to the CO2 causes global warming runaway feedback thesis. On two fronts:

    1) How can it be CO2 if the CO2 warming takes a break each summer? It just is not possible.

    2) How can there be runaway feedback if we have an 8C rise in winters, with no runaway, and with an apparent ‘hard lid’ at about 20C in summers that just never rises (even over 200 years+)? Again, it isn’t possible.

    I was so stunned by the implications that I “rushed to print” in a comment on WUWT just to get more eyeballs on the point. On my “to do list” now is to fully vet my data (i.e. get those ftp scripts done so I know I have the most recent data) and then redo the runs and actually plot the results, not just give a couple of sample data records. I also need to do a full “code review” on my analysis programs that produced those records just to make sure I’ve not written buggy code.

    I’d love to have it all done today (and I’d love to have the Market Update I promised you done today too…) but it’s all taking more time than I have (and my kid is home from college and the car is in the shop and one family member as a 101F fever and… life happens.)

    So I’m working this in around the edges as “life” permits.

    And for now, that was to put a “Hey, look at this!” comment up and then start the QA run on the code that lead me to that “winter only warming” conclusion (and THAT sent me down the rat-hole of USHCN2v2.f changing behaviour with what complier is used on it – that sucked down today from 5 AM to noon…)

    So much to do, so little time. Sigh. I resent the fact that Hansen et. al. suck down this time from my life that I’d rather spend on finance, but in some ways it’s much more important. They have a claim that it’s CO2 based on this code, yet this code CLEARLY shows the warming is almost all in winter, nearly none in summer; and THAT can’t be from CO2. If I can hang them with their own rope, it will be worth it, I guess. At least I can see a bit of light at the end of this tunnel now 8-)

  3. Ellie in Belfast says:

    I’m afraid code goes straight over my head, but I am interested that you found the warming is almost all in the winter.

    I’ve been periodically working on a very specific raw data set and also found much more warming in the winter.

  4. E.M.Smith says:

    Ellie,

    No body gets ‘coding’ all at once. It isn’t hard, but it takes time.

    You start with something like the BASIC language where it’s pretty obvious what things do, like:

    PRINT “Hello Ellie!”
    LET x=10
    LET y=x+1
    PRINT “Y is:” y

    Which will print out

    Hello Ellie!
    Y is: 11

    Then work your way up. One syntax bit at a time.

    Jumping right into oddly written FORTRAN is not the best way to decide your abilities as a programmer!

    So before deciding that “code goes straight over my head” please consider this:

    One of the earliest programming languages ever made was for knitting. You can go buy knitting patterns that will tell you exactly what to do to make a sweater (in a language I can’t program / read well anymore!) with the proper pattern.

    Knit one, pearl two, knit 3, pearl two; repeat that 10 times, cast off.

    It includes “subroutines” and “do loops” and many other structures of a “programming language”. Yes, the syntax and grammar are a bit different, but the structure is the same…

    So what am I saying? I’m saying that long before computers, women who knit had invented programming. If you can knit, you have all the skill needed to understand code and both read and write computer programs. The rest is just persistence.

    Each computer language has it’s own personality (and it’s own abilities, limitations, style, and advocates!). One of the things that happens is that someone learns a language and it shapes their style. Then they look at another language and don’t like it because it is “wrong” given their (new) style biases. Oh Well!

    Often, folks will try to learn some language or other based on the biases of a programmer friend, only to learn that the programmer friend loves a complicated language with a painful degree of “geekyness” and the person is “not so geeky”… IMHO, “C” is such a language. Terse, elegant, powerful, and terribly non-intuitive to the typical person. I just spent a whole day trying to remember how to properly open a file and do a simple bit of processing on it. (It’s hard to transition from FORTRAN back to “C” … to some extent the styles are mutually exclusive… I’ve written a lot of C before, but the FORTRAN kind of muddied my style memory…)

    COBOL is terribly wordy (and beloved by folks who like programs that look more like English, but hated by folks who don’t like typing 10 lines of text when 1 line of symbols would do the same thing). FORTRAN is very “math like” with a lot of emphasis on math concepts (like integers vs floating point). FORTH has a syntax more like Yoda speaks: Backward it is, first thing you say, then processing you get! “C” is for folks who hate to type. Many of the key “words” are special characters. Why say “THEN DO” if you can do “{” instead? So it often looks like a lot of {,( and [ ] )|} stuff that makes no sense at all until you study what they mean in THAT language! But boy is it easy to write a small program that does a lot!

    There are dozens of common computer languages. So before you decide that you “can’t get it”, remember that you need to start with just ONE of them. Once you understand that one, the rest become learning an “accent” … Is it “PRINT” “WRITE” or “putline” in this language? Is it “Let x=10” or “x=10” or “x:=10” in this language? (the hardest part becomes keeping them straight once you learn a few…)

    And realize that it really isn’t very hard to do. It just take a bit of attention to detail. Rather like making sure you really do a consistent “pearl 2” stitch size… (I could never keep the stitch sizes very consistent. Gives my knitting a nice “rustic” look 8-)

    FWIW, my first exposure to “programming” and where I first learned the concept of a “subroutine” was in my Mothers knitting pattern book. I was making a long straight 8 inch wide strip (what my mom called “Knitting a knitting” ;-) while she was making a complex sweater with a raised checkerboard pattern. Mom patiently walked me through the pattern (program) and showed me where it said “repeat this block x times, then do that bit, then come back here and do x more of these (where x depended on the size you wanted. A “variable” in computer terms). I was about 6? Want more programming skill? Take up knitting!

  5. Ellie in Belfast says:

    OK. Nice comparisons. Yes I understand the principles, but just don’t know the language/terminology. I have used knitting patterns (a long time ago – no time for knitting now) and come to think of it I did have a fun with an early Texas Instuments programmable calculator (it had an LED screen) until I realised I couldn’t use it for high school exams and had to replace it with a more basic model. Also did some very simple progamming in Basic as an undergraduate, again a very long time ago.

    It is not that I feel incapable or unwilling to try, it is simply a cost vs benefit thing (time-based); I struggle with comprehension of the language.

    Still keen to see what you come up with. BTW good summary of AGW basics on more recent post.

  6. Ellie in Belfast says:

    Hey, when you say warming in the winter, do you mean NH winter (well I assume you do since more NH sites)?. Would it be possible to resolve the data into SH and NH and look for winter warming in the SH winter?

    Sorry I’m a bit hung up on this now.

  7. Ellie in Belfast says:

    From local data in Ireland (single site): Int. J. Climatol. 25: 1055–1079 (2005) C. J. BUTLER,et al. Abstract –
    “…temperatures in Armagh, in all seasons, show a gradual overall trend upwards. However, there are seasonal differences: summer and spring temperatures have increased by only half as much as those in autumn and winter. This is partly due to the exceptionally cold winters and autumns experienced prior to 1820.”

    link to full paper: http://star.arm.ac.uk/preprints/445.pdf

  8. E.M.Smith says:

    Yes, I mean NH Winter.

    Unfortunately, the GIStemp code is fragile with respect to changes. I’ve spent most of the weekend trying to get it to accept a reduced set of station records and it sill blows up. (It expects and depends on the GHCN record containing all the USHCN stations… and on record counts being in known relative sizes, so you can’t just chop out the SH stations from GHCN and have it run… So I’ve been trying to cut it back to the “best 2000” stations in GHCN and I can’t get it to run, yet. At least not with the approach I was taking.

    I may need to just look at the straight GHCN data and not bother with trying to get GIStemp to run on a subset…

    But, yeah, that “CO2 takes the NH summers off” is a big deal.

  9. cdquarles says:

    Hi ChiefIO,

    Will this run on Windows using Cygwin or the Windows UNIX subsystem? If it will, do you have a tarball or zip available for download?

  10. E.M.Smith says:

    I have no idea if it will run on Windows under a UNIX emulator. I suspect it depends on the quality of the libraries in the emulator and how good the compilers are.

    It requires a set of two FORTRAN compilers. One F77 compliant, the other about F90 or F95. It also requires some fairly simple shell scripts.

    The “hard bit” IMHO, would be STEP1 where it depends on both “C” and “Python”. Those would have to be present and with libraries of the forms expected.

    It is my opinion that the easier approach would be to just install Linux as a boot option on the Windows box. I’ve done this many times (SuSe Linux made it darned near a no effort option, just say yes, and it would slide Windows over into part of the disk and use the remainder for Linux. Oh, you did tell it how big for each part IIRC). Basically, use the partition manager of your choice to slide Widows into a subsection of the disk, then proceed with Linux.

    If that worries you, it’s darned near trivial to put a USB drive on a box these days for very little money, and just put the Linux on that disk. Might be a bit slower than a main bus disk, but frankly, this doesn’t exactly take a lot of speed to run… (I’m on a 400 Mhz AMD chip with 100 Mhz memory and EIDE drives IIRC. It’s about 15 years old…) I think I got a 140 GB disk for about $90 ? Last time I looked a 1 TB disk was $120. This needs about 250 MB … you can probably get that for less than a double decaf mocha no-whip 8-)

    At any rate, I don’t have a posted tarball at this time. WordPress doesn’t seem to have an ftp portal option (at least, I have not found one). If you want a tarball, you need to tell me where to stick it ;-) probably via the email address in the “about” tab, on your machine or service. (I’m sure somebody somewhere has an open ftp site, I just don’t have the time to go searching for one…)

Comments are closed.