GIStemp STEP0 USHCN2v2

Overview

In this section, we replace some interpolated temperatures from NOAA with missing data markers so GIStemp can interpolate them a different way, in a later step, and we change from F to C. Somehow it seems more painful than that…

We will be looking at get_USHCN and USHCN2v2.f in this section (both the related script and the FORTRAN program). I’ve chosen to leave the program listings as one large block rather than interleave comments. This is so that experienced programmers can just “see the code” and read it. It also means that “newbies” can just read the commentary and skip the code.

It will only frustrate the folks who sort of remember the programming language and would really like the description interlieved so they would not need to keep hopping back and forth. The code is short enough that that ought not to be much of a problem.

A word or two about execution order. In most cases we find that there is a wrapper script around a FORTRAN program that does the compile, execution, deletion. This case is different. In this case the top level script do_comb_step0.sh does the compile, then it calls the script get_USHCN to do the execution. A search of all programs and scripts in this section showed nothing that deleted USHCN2v2.exe so one is left to presume it will hang around until manually deleted (or the program is run again and the new copy gets created on top of the old one.) Why? “Why, don’t ask why, down that path lies insanity and ruin. -e.m.smith”

What else does this script do?

First, the script: get_USHCN.sh

Begin listing of get_USHCN.sh:

#!/bin/ksh

echo "extracting FILIN data"
grep -e " 3A" < input_files/hcn_doe_mean_data > hcn_doe_mean_data_fil

echo "getting inventory data for v2-IDs"
sort -n input_files/ushcn.tbl > ID_US_G

USHCN2v2.exe

=========================================================

So what does it do?

It pulls out “FILIN” records, sorts them, and runs USHCN2v2.

That’s it. The whole thing.

It is a Korn shell script, though I didn’t see anything in it that was unique to Korn, so running as a Bourne shell (.sh) ought to be just fine. I can only figure that the use of ksh is just to play with ksh. I don’t really see much reason to choose one over the other, but it would be “nice style” to pick one or the other and stick with it.

I can also see no reason to pull out this “grep” and “sort” into a standalone piece unless there is some need or desire to play with the step. The whole point is to pull out the type “3A” records (“FILIN” data type, those that were created by NOAA via the NOAA method for filling in missing data) and sort them for use later. (i.e. to replace their values with the missing data flag -9999).

Were I designing this step, I would put the compile into a “make file” or, at a minium, I would put it into this script to match the style of the other “wrapper scripts”. Though if keeping that wrapper script style, I would put a parameter that let you choose to compile the programs or not – basically, I’d add flags for “compile” and “delete” so you could run as at present (both flags positive by default), or choose to compile once and not delete anything so that re-runs would go faster and manual steps could be run with or without recompiles. I know, a hack instead of a “make file” but one that would not disrupt present operational behaviours.

Then again, in a complete redesign, one could just load “3A” types into a data base, with the appropriate subsitution. Frankly, much of what GIStemp does in the early steps would be best dealt with by a simple set of database load scripts.

So after we run the script what then? On to USHCN2v2.f …

The FORTRAN antarc_comb.f

 

Here is the listing. The explanation will follow below, after the === bar.

      character*134 line
      character*11 idg
      real*4 temp
      integer itemp(12)

      open(1,form='formatted',file='ID_US_G')
      open(2,form='formatted',file='hcn_doe_mean_data_fil')
      open(10,form='formatted',file='USHCN.v2.mean_noFIL')
      open(20,form='formatted',file='USHCN.last_year')

      do mfil=0,0 ! ,1 0: fill-ins skipped; 1: fill-ins used
      iyrmax=0
      rewind 1
      rewind 2
      nout=10+mfil
      read(1,'(5x,i6,1x,a11,1x,i1)') idus,idg,iz
   10 read(2,'(a)',end=100) line
      read(line(1:6),*) idus0

c**** check whether us-id is still ok
      do while (idus0.ne.idus)
         read(1,'(5x,i6,1x,a11,1x,i1)',end=200) idus,idg,iz
      end do

      itemp=-9999
      read(line(8:11),*) iyear
      if(line(13:14).ne.'3A') stop 'should not happen'
      do m=1,12
        indd=15+(m-1)*10 ! start of data
        indf=indd+9      ! position of fillin flag
        if(mfil==0.and.line(indf:indf)=='M') cycle
        read(line(indd:indd+5),'(f6.2)') temp
        if(temp.gt.-99.00) itemp(m)=nint( 50.*(temp-32.)/9 ) ! F->.1C
      end do

      write(nout,'(a11,i1,i4,12i5)') idg,iz,iyear,itemp
      if(iyear.gt.iyrmax) iyrmax=iyear
      go to 10

  100 continue
      write(20,*) iyrmax
      write(*,*) 'USHCN data end in ',iyrmax
      end do
      stop
  200 write(*,*) 'id-file ended !'
      stop
      end

=========================================================

Farily short. But a bit convoluted. It has three places where it ends.

The program starts off with some data / variable declarations. These are fairly typical. We get both “real” and “integer” types for temperatures temp and itemp and we get a blanket line of text character type named “line” for holding lines of free text input.

There are 4 files opened. Two input (1, 2) and two output (10, 20) with hard coded file names. The two inputs are the list of US station IDS and the list of “3A” type records created by the script above. File 20 gets the single “most recent year with data” that we run into. File 10 ought to end up with the USHCN records, with a V2 type structure, but with “3A” records FILNET values removed.

The rest of this program is best decoded by bits, starting at the bottom. There are a couple of parts to it that are either incredibly clever, or bugs. I’m still wondering which… For example, the use of “mfil” as a flag to control what processing is done.

Starting at the bottom, the “200” target just writes out that the program ran out of “id-file” and stops. That looks like an end state where you have processed all the stations listed in the ID_US_G file but not properly finished and are just ending. There is no output of the “iyrmax” variable needed by the next step. The only code that comes here is the “read” statement about 5 lines below line “10”. So the program assumes it will run out of “3A” records before it runs out of stations, or it’s an error.

RIght above 200 we have the “100” target. This is only reached by the “read” statement in line 10. It writes out the “iyrmax” needed by the next step and writes a console message that “USHCN data end in” that year. This looks like a normal end case. We can end normally if we run out of “3A” records but still have some stations left to process.

But now we have a bit of an “issue”. We have an “end do” just before the stop. Why? That’s a bit odd. Looking upstream, the only unmatched “do” is the one at the top; 6 lines above line “10”. Now we already saw that we had this “end 100” in line 10, and the line just above 100 is a “go to 10” so the natural processing loop of this program is from line 10 to just before line 100, jumping out to 100 when we complete normally. Then this “end do” is sitting there implying a larger outer loop. And what does the comment on this outer loop say? The cryptic “1 0: fill-ins skipped; 1: fill-ins used”. BUT, since the whole purpose of this program is to filter out fill-ins I’m a bit puzzled. Here we have a switch to turn off it’s major function? Why? (No, I won’t say it, but I’m thinking it…)

The comment is also a bit cryptic about what the “,1 0” vs “1” is. I think the “,1” is to imply that the do loop increments by 1 (the default). So why not just put it in as the third term instead of in a comment? “Why, don’t ask why, down that path lies insanity and ruin. -e.m.smith” To what values do we set the “do loop” to do what? The present value of “0,0” isn’t mentioned in the comment. And the variable “nout” is set to be 10 plus the value of the loop counter, mfil. Yet we only have output files with descriptors of 10 or 20, so any other value has no file to which to write data. The line 3 above 100 writes to descriptor “nout” and expects it to be “10” to hit the normal output file. I fail to see how any setting other than zero for mfil will give any output. And since this program also does the F to C conversion, that ought to be an issue. You can’t just skip it. I think the comment says to use 0 for the loop start value for no infill data and to use 1 to get infill data, and the default increment is 1.

So as set, the outer “do loop” seems inactivated, but activating it looks like it will only break the program. So what does it do? And, dare I say it, WHY? Maybe there is some trick magic going on here, but it looks broken to me. Any insight is welcome! (At a minimum, the code is badly commently, poorly documented, prone to eronious mantenance, and just crytic crap. At best, it’s a bug.)

We’ll leave it for now and drop into the 10 loop. This is the meat of the program. On the first entry, just above line 10, we read a US station ID, then we enter the loop and read a USHCN record and pick out the station ID “idus0”. As long as it’s not a match to the US Station ID, we keep fishing in the ID_US_G file for a new record (with running out being that error 200 we discussed above…) Once we have a match, we exit the tight read loop and head on down the road to picking the year out of our temperature record. Along the way we set the default temperature for itemp to the ‘missing data flag’ of -9999. This looks like it is done for all 12 fields with one non-subscripted assignment. Interesting trick, not very clear. We also check that the record has a “3A” type and squawk if it doesn’t while halting. (You might think this a bit paranoid, but in fact it protects against people manually messing with the data and against the odd compiler fault. More than once my “you can’t get here” impossible error message has printed out… )

So we have a valid “3A” record for a valid station. What next? For each of 12 months, we find the start of the data and the fillin flag. As long as mfil (our strange friend) is zero, and the flag is “M” for missing data, we “cycle” back to the next month keeping the -9999 in place. If the flags are mfil non-zero OR not an M type record, we move on and calculate the temperature in 1/10C degrees from the F value. So setting mfil to 1 would cause the temperature to be used, but when we hit the following “write(nout,” it would try to write to unit 11 that has not been opened. So this ‘use infill’ flag looks broken to me. I suppose that since it has to be set by editing the code one could also change the assignment of nout at that time too, but this is just tacky. Why is there a hand coded flag to use the data anyway? It just smells like data modeling to get the results you want to find.

At any rate, once you are done with the “delete the M infill temps” and swap F to C you write it all out and if you have a newer year, increase the iyrmax value accordingly, then go do it again until you reach the end of the USHCN data and report your max year and end.

Well, that’s it.

We replace some interpolated temperatures from NOAA with missing data markers so GIStemp can interpolate them a different way, in a later step, and we change from F to C. Somehow a database load script is looking better and better…

Oh, and that change from C to F carries lots of false precision with it. Mr. Mcguire would not approve.

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.