GIStemp STEP Minus 1

OK, we already have a STEP0 in GIStemp, so all those manual presteps covered in the intro posting and overview are part of a “Step -1” as is the data download from GHCN.  Here I’m going to add in the optional “sort.f” program from STEP0/input_files.  You don’t really need to run it until you want to get your own update to the Antarctic data, but for folks who want to know what it does, well, we ought to take a look.

It’s also the case that you can get a little bit of a start on understanding the GIStemp coding style by looking at the little bits first.

So, what are they?  do_sort and sorts.f are the two programs.  The first is a shell script, the second is FORTRAN.

do_sort

OK, it’s a Korn shell script.  Substantially the same syntax as SH Born shell for most purposes.  Easy enough.  And it set’s a “compiler name” variable: “fortran_compile” to the environment variable “$FC”.  Note to self:  set environment variable to local FORTRAN compiler name…  Or better yet, make a real source code repository and a ‘makefile’ to do a proper build system.

Then it calls the compiler to compile sorts.f, and runs it over three input data sets of Antarctic data (the unsorted data we downloaded earlier).  But how odd.  It uses “grep” (global regular expression print – don’t ask…) to fish out the lines containing the text “page” from the file antarc1.txt and passes that through the UNIX / LINUX sort built in program, the output going to stn_list then copies the file to antarc.txt and calls the FORTRAN program “sorts”.  One is left to assume that the sort.f program expects to find it’s input in antarc.txt (we’ll see below). 

At the end, we remove the scratch file stn_list and the input file antarc.txt and rename the output file antarc. sort.txt to antarc1.sort.txt (Get used to this.  This pattern repeats all through the GIStemp code.  Input rename process output rename delete delete delete…)  If done in a database, this would likely be a single line of text or maybe two since all the file swapping could be avoided.

#!/bin/ksh

fortran_compile=$FC
if [[ $FC = ” ]]
then echo “set an environment variable FC to the fortran_compile_command like f90”
echo “or do all compilation first and comment the compilation lines”
exit
fi

${FC} sorts.f -o sorts.exe

grep page antarc1.txt | sort > stn_list
cp antarc1.txt antarc.txt
sorts.exe ; rm -f stn_list antarc.txt ; mv antarc.sort.txt antarc1.sort.txt

grep page antarc2.txt | sort > stn_list
cp antarc2.txt antarc.txt
sorts.exe ; rm -f stn_list antarc.txt ; mv antarc.sort.txt antarc2.sort.txt

grep page antarc3.txt | sort > stn_list
cp antarc3.txt antarc.txt
sorts.exe ; rm -f stn_list antarc.txt ; mv antarc.sort.txt antarc3.sort.txt

rm -f sorts.exe

So at the end of all this, we’ve made three sorted output files.  OK.  Then we delete the FORTRAN binary.  

Notice that all this file creation and deletion is going on in the same directory with the program sources and the input data.  Shudder!  Also this code suggest using “f90” but other code suggests “f77”.  Hope nothing is very “release dependent”.

OK, with a built in sort command being used, what is it that sorts.f does?

sort.f

We declare two variables named “line” and “linet” 105 characters long. We then open the input files “1” and “2” and the output file “10” as we surmised they would be up above. I’m not sure what “trim” does but it looks like maybe some kind of blank suppression. I’ll look into that later if I need to. Then we loop through the data set pulling each record for a station and printing them out in order, “rewinding” the data set after each station and moving on to the next station.

       character*105 line,linet

      open(1,file='antarc.txt',form='formatted')
      open(10,file='antarc.sort.txt',form='formatted')
      open(2,file='stn_list',form='formatted')

      read(1,'(a)') line
      do while (index(line,'page')<=0)
         write(10,'(a)') trim(line)
         read(1,'(a)') line
      end do

  10  read(2,'(a)',end=20) linet
      do while (trim(line).ne.trim(linet))
        read(1,'(a)') line
      end do

      write(10,'(a)') trim(linet)
      read(1,'(a)') line
      do while (index(line,'#')<=0)
        write(10,'(a)') trim(line)
        read(1,'(a)') line
      end do
      write(10,'(a)') trim(line) ; write(10,'(a)') ''
      rewind 1
      go to 10

  20  write(10,'(a)')
     *  '#-#-#-#-#-#-#-#-#--------------------------------------------'
      stop
      end

And at the bottom we write a “end of data” marker consisting of pounds and dashes. OK, not too bad. Kind of “1970s” style, but hey, it works and there doesn’t seem to be anything “odd” going on. A clean data load into a database with stationID as a key field would eliminate most or all of this process.

Now, the rest of GIStemp is a lot like this, but the programs are longer and more complicated, there are many more files being shuffled about, and the general level of “cruftyness” is higher.

The biggest issues are this: Keeping track of what file the data comes from and goes to next, and what process is really being done in the transformation. Wishing there was some kind of data definition, file names chart, process documentation, something, anything, that kept track of it all. Embrace that. It will be with us for a while…

Advertisement

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.