This program takes two datasets (Antarctic and GHCN type), changes some missing data markers and merges the two into a single output data set.
We will be looking at antarc_comb in this section (both the script wrapper and the FORTRAN program under it. I’ve chosen to leave the program listings as one large block rather than interlieve comments. This is so that experienced programmers can just “see the code” and read it. It also means that “newbies” can just read the commentary and skip the code.
It will only frustrate the folks who sort of remember the programming language and would really like the description interlieved so they would not need to keep hopping back and forth. The code is short enough that that ought not to be much of a problem.
First, the script: anatc_comb.sh
if [[ $# -ne 2 ]] then echo $0 combines arg1=antarctic data with arg2=v2.mean ; exit ; fi ln $1 fort.1 ln $2 fort.2 f77 antarc_comb.f -o antarc_comb.exe antarc_comb.exe ; rm -f antarc_comb.exe mv fort.12 v2.meanx rm -f fort. echo "created v2.meanx from v2_antarct.dat and v2.mean"
That’s it. The whole thing.
So what does it do?
It checks to see if you have 2 arguements and if they are not there, tells you you need them ( and anarctic data set and a “v2.mean” or GHCN type data set). If it did not get them, it exits.
If you did get two data sets passed as arguments, it then links them to the file names “fort.1” and “fort.2” in the present directory. (“Why? Don’t ask why, down that path lies insanity and ruin.” -e.m.smith.) So I’m going to presume that the FORTRAN program will be hardcoded for those to be the two input file names.
It then explicitly calls the f77 FORTRAN compiler to turn the source code antarc_comb.f into the binary executable program antarc_comb.exe and then runs that program, and promply deletes it.
OK, so we’ve seen in other places where we set an environment varable for the FORTRAN compiler. This script doesn’t use it. If your FORTRAN compiler is not named f77, the script will not work. (Note to self: if I need to actually used this program, put in the environment variable code.)
Why do they compile , run, delete? Do I really have to say it again? “Why? Don’t ask why, down that path lies insanity and ruin. -e.m.smith” Exploring why, I would guess there are a couple of reasons. One is sloth. It’s easier than making a real source archive and a “make” file. Another is that this script seems to duplicate a step in the top level controlling script. That is, I think this is a ‘hand tool’ someone wrote to play with or test this step. One off hand tools are often a bit sloppier than the real production path. And finally, interpreted languages, like scripts and BASIC, are convenient for iterative development.
My sense of the style of GIStemp is that it was iteratively developed. Do a bit, run it, change it, run it, model the data, change it, like it? Done. Never to be improved again. So we have this kind of cruft to deal with.
So after we run the program what then? The file fort.12 is moved to v2.meanx and the files fort.1 and fort.2 are deleted. Then we proudly declare: “created v2.meanx from v2_antarct.dat and v2.mean” and end.
That means that this step is just to create v2.meanx by blending v2_antarct.dat with v2.mean. So how does it do that?
The FORTRAN antarc_comb.f
Here is the listing. The explanation will follow below, after the === bar.
integer itmp(12),itmp0(12) character*200 line real tmp(12) C**** add Ant-data (unit 1) to v2-data (unit 2); new file: unit 12 C**** assumption is that IDs are ordered numerically low->high in both C**** input files call getarg(1,line) open(1,file=line,form='formatted') call getarg(2,line) open(2,file=line,form='formatted') open(12,file='v2.meanx',form='formatted') read(1,'(a)') line ! read from new sources ! CountryCode ID year data call readl(line, icc0, id0,iyr0,tmp) 20 read(2,'(i3,i9,i4,12i5)',end=200) icc,id,iyr,itmp ! read v2 C**** copy v2-source data from earlier stations or same station 25 if(icc.lt.icc0.or.(icc.eq.icc0.and.id.le.id0)) then write (12,'(i3,i9.9,i4,12i5)') icc,id,iyr,itmp ! iccp=icc idp=id go to 20 end if C**** merge in station data from (presumably) new source if(icc0.eq.iccp.and.idp.eq.id0) then ! should not happen write(*,*) 'ID already used',icc0,id0 stop 4 end if 35 write (12,'(i3,i9.9,i4,12i5)') icc0,id0,iyr0, * (nint(10.*tmp(i)),i=1,12) icc0p=icc0 id0p=id0 read(1,'(a)',end=100) line ! read new data call readl(line, icc0, id0,iyr0,tmp) if(icc0.eq.icc0p.and.id0.eq.id0p) go to 35 go to 25 ! to compare to current v2-station c**** No more new data - copy data for remaining v2-data 100 write(12,'(i3,i9.9,i4,12i5)') icc,id,iyr,itmp read(2,'(i3,i9,i4,12i5)',end=300) icc,id,iyr,itmp go to 100 c**** No more v2 data - copy remaining new data 200 write (12,'(i3,i9.9,i4,12i5)') icc0,id0,iyr0, * (nint(10.*tmp(i)),i=1,12) read(1,'(i3,i9,i4,12i5)',end=300) icc0,id0,iyr0,tmp read(1,'(a)',end=300) line call readl(line, icc0, id0,iyr0,tmp) go to 200 300 stop end subroutine readl (line,icc,id,iyr,tmp) C**** reads T-data from standard 12f8.1 tables with missing=-999.9 or C**** from non-standard tables with blanks for missing data, whose C**** fields are 7 characters wide and start at columns 19 or 20; C**** so data=' ' or ' xxxx.x', but not all blank ! character line*200,blank*7/' '/ real tmp(12) read(line,'(i3,i9,i4)') icc,id,iyr if(line(23:23).eq.'.') then ! standard tables read(line(17:200),'(12f8.1)') tmp return end if C**** Non-standard tables C**** check whether data are present and how they are lined up np=index(line,'.') if(np.le.0) then write(*,*) 'non-data line encountered:',line stop 3 end if n1=16+mod(np,7) ! position of first data field do m=1,12 tmp(m)=-999.9 ! if field is blank if(line(n1:n1+6).ne.blank) read(line(n1:n1+6),'(f7.1)') tmp(m) n1=n1+7 end do return end
The program starts off with some data / variable declarations. These are fairly typical. We get the integer types for temperatures itemp and itemp0 and we get a blanket line of text character type named “line” for holding lines of free text input.
Conveniently, there is a comment block to tell us what this code does (glue some datasets together) and that it expects them to be pre-sorted. Now, I’m not too fond of writing sort programs (I’ve done it too many times to be fun any more) but one of the first programs we wrote in my FORTRAN IV (yes, that long ago…) class was a sort program. It is just a tinsy bit lazy to not either check for sort order if you depend on it (and exit at least) and even better, to accept unsorted data and just sort it yourself. It’s not bad practice to do it this way, just lazy. But at least they told us. The way I would have designed this is as a subroutine. The top level would test for sort, if unsorted, call a sort program to sort it, then call this subroutine to do the merge. But maybe that’s just me.
OK, at this point we read in the arguements into that text buffer “line” and peel out the two file names. That makes me wonder all the more why we renamed these files to fort.1 and fort.2 in the script. The program is set up to take them as arguements, so why all the renaming? Why? Don’t make me slap you again…
We also open the output file descriptor 12 as v2.meanx and read in a line of text from the first Antarctic file. Then using a “call read” we break some fields out of that line of text (country code, station ID, year, and a data buffer named tmp holding 12 type “real” numbers).
At this point we enter a loop that repeatedly reads the GHCN type data out of v2.mean until we “catch up” with the Antarctic data. When we run out of GHCN type data, we “goto 200” to finish up. Until then, we keep checking country code and station ID to keep things sorted and read / write data.
I’m not going to unravel the nested spaghetti here verbally, just notice that there is a “35 loop” that keeps reading Antarctic data until it catches up, then breaks back out to the “25 loop” to get more GHCN.
This ping pong keeps up until one or the other runs dry. If Antarctic is first, you end up at “100”, if GHCN runs out first, you end up at “200”. These both just run out the rest of whatever is left to read / write in the remaining dataset, then moves on to “300” for the wrap up where the program ends.
Yes, it looks horrid to a C or Pascal programmer. It’s FORTRAN, get over it. That’s how you do things in a non-block structured language. It works. (And yes, it would be about 1/2 the size and twice as readable in another language. It would be about 1/10 the size and 10 times as readable as a database load step, so don’t go tossing rocks or I’ll sick the database guys on you!)
And at the bottom we have the subroutine “readl” that sorts out a couple of kinds of missing data indicators (-999.9 or blanks). It’s an interesting bit of flexibility code written in a non-flexible language. Again, a database load script would make this much more clear.
Well, that’s it.
We clean up some missing data flags and we merge a couple of data sets. Don’t you feel accomplished now?