It is believed that use of this image to educate as to the meaning of “Red Shirt” is fair use.
The Nature Of The Beast
This posting has some technical content, but it is, I hope, “pitched” so that any one can “get it”. It is, for me, a plea for sympathy. Not just for me, but for all the “maintenance programmers” of the world. By choice, I’ve taken on GIStemp. Most maintenance programmers get handed a “bucket of bits” and told “add a program to do FOO” or “Jimmy wants BAR changed to bar” and just dives into what ever bucket of spit they were pointed at. YES, that is the lot of the “maintenance programmer”.
The “boss” does not want to hear that it’s a load of crap. Odds are he, or one of his friends, managed the creation of it (or, horrors, wrote it himself!). You get no sympathy when you say “It will take a week, maybe two” (really having no idea at all if it’s 10 minutes or 10 years to get it done. It’s hard to gauge the size of a Code Black Hole Of Death until you have been sucked in beyond the event horizon…) The inevitable answer is “Heck, that ought to be a one line change! WHY, In My Day mumble drone…”. I know this for the simple reason that I have BEEN the management saying “It ought to be…”
One of the rules in shops I have run is the “bull shit rule”. ANYONE can call “bull shit!”, but then they have to prove it. This has resulted, at times, in my showing a junior sysprog that it really WAS a “one line change” that took only 10 minutes. It has also resulted in my calling “Bull Shit” and spending a few months trying to figure out just how it is that a one line change can require 20000 lines of code in three languages… After a few months, folks are very reluctant to call “Bull Shit!”, both the programmers (when told it ought to take a day) and the managers (i.e. me) when told my “one day project” ought to take 6 months and 3 staff adds. It’s a great discipline tool, for both sides. And it’s fun to watch!
OK, what does this have to do with AGW (be it Anthropogenic Global Warming or Airport Global Warming ;-), the GIStemp software, or “42”?
Pretty simple, really: I’m both “grunt” and “management” on the “WTF does GIStemp Really Do, and How?” project. So I’m the maintenance programmer. And I’m swimming one day in decent code, and the next day in a bucket of warm spit. And, unfortunately, there is no one with whom I can “share” (i.e. solicit sympathy). So this posting is going to look at one small program that I just wormed my way through. The crap it started as, and where it ended up. This is just one small percent of GIStemp.
What I expect folks to get from this is simple: a) GIStemp has some real crap in it. b) We ought not to be betting the fate of the world on code written this way. c) Maintenance programmers need a hug.
OK, what program, and why?
I’ve shown that GIStemp acts as an amplifier of the “warming signal” in the data through all the “Temperature” steps (i.e. before temperatures are turned into “anomallies”) and that this signal is only present in the Northern Hemisphere winter and short lived thermometers. Something CO2 can not do. I have pointed a strong finger of probability at Airports as the cause, and demonstarted that adding a pot load of thermometers at Tropical Airport Vacation Destinations along with the migration of Siberian latitude thermometers to the latitude of Italy (all inside one GIStemp zone, so not caught by the zonal steps) probably are the cause.
The result has been a chorus from the “warmers” that “The Anomaly Will Save Us!” and “Use The Anomaly Luke!”… All that crap in STEP0, STEP1, and STEP2 early phases will be fixed by STEP2 the later phase and STEP3!! (STEP4_5 can’t save anyone, since it is Sea Surface Temperature Anomaly Map. All it does is merge the GIStemp anomaly map with the SST map delivered from Hadley as a “done deal”. So if the land data are not warming by STEP3 end, there is no global warming. GIStemp even states that STEP4_5 is optional.) So I headed off to do a bit more looking at “The Anomaly Steps”. I’d done a “once over” before, but thought I ought to be a bit more detailed this time. I knew what the pieces did, but some of the code was bit, um, unclear to me.
Can I Beam To The Surface Too?
Re-read that last line. “Some of the code was a bit, um, unclear to me.”
That is a maintenance programmer’s “Red Flag and Charging Bull” warning. You look at something. You know what it does. The Guzintas and the Guzoutas look right (i.e. inputs and outputs match ‘spec’ or implied spec via what other programs are looking for) and you say to yourself “I’ll come back to that one just a bit later and really get to understand it. There is no problem at this time..” Bring up the French Horns as you have just donned a Red Shirt and are asking the Captain if he needs a Volunteer on this Away Mission and no, you don’t need billing in the credits…..
So I “came back to one” in the “anomaly phase” (that is destined to save the planet from The March Of The Thermometers to tropical paradises all over the planet.) It is the program “invnt.f” that does a minor reporting job in the process. But it bugged me. It is run a couple of times in STEP2 and it was, well, just a bit Bizantine when I looked at it on the “one pass” fly by. Things just didn’t add up. There were holes in my understanding map of it, where some variables didn’t have a “use” mapped to them. So I returned. After all, this is part of the code that “Will Save Us from The March Of The Thermometers!!”
Well, What Happened?
OK, the short form is that I found a lot of something called “Dead Code”. That is, bits of program that do nothing. Either you can’t get to them (they never execute) or they are simply doing nothing of value. A simple non-GIStemp example?
5 x=10 10 Goto 30 20 Print “The world can be saved by this formula: $FORMULA “ 30 Print “We are all doomed!” 50 STOP
Clearly, line 20 is never reached.. It is “Dead Code”. Line 10 does a “goto 30” and jump over it, while nothing ever comes back. Less clearly, X is given a value, but nothing is ever done with it. It is a waste of time and space, doing nothing. The effective, actual program is:
30 Print “We are all doomed!” 50 STOP
Even the “10 goto 30” is “dead code” in a way in that once you prune out the actual dead code, it contributes nothing. (Technically it isn’t “dead code” in that it does execute when the program is run, but it is junk code. Ah, the nuances of mouldy code, like fine stinky cheeses… only without the flavor…)
Now, for a maintenance programmer, Dead Code is especially problematic. First off, if it has variable names in them, you are reserving part of your brain for what they do, and what values they hold. Second, if they have formulas, you are trying to figure out how these tie in to all the other parts of the program. CLEARLY it all must hang together somehow! It MUST have meaning and you MUST find it. Yet it may all be mindless dreck and error laden misdirection; adding nothing but confusion.
Dead Code has no meaning. It is quicksand designed to suck away brain capacity needed elsewhere… It is the Special Forces guy passing as a peasant who will sell you a load of broken understandings and send you into a swamp with promises of a clear road.
So what did I find in “invnt.f” when I looked really closely? I found a lot of code that did nothing, more code that hid what it did in obscure ways. And some code that looked like it did something important, but never did print out a value, nor participate with any other code (i.e. Does Not Play Well With Others.). How much? Oh, about 1/4 to 1/3. It depends a bit on how you do the math. (But doesn’t everything? Especially temperature averages. ;-)
We’ll look at that in a bit more depth later (for now, just let that size sink in…).
What does “invnt.f” do? It produces an output file that I think is an “inventory” of stations, by latitude Zone, with some minor information about them.
Here are a couple of selected output lines to give you a ‘feel’ for it. I’ve “wrapped” the lines right after the name field so WordPress will treat them nicely, but they really are just one lone line per station:
Ts.GHCN.CL.1 200460003 GMO IM.E.T. lat,lon (.1deg) 806 580 R A cc=222 Ts.GHCN.CL.1 200690003 OSTROV VIZE lat,lon (.1deg) 795 770 R A cc=222 Ts.GHCN.CL.1 202740000 OSTROV UEDINE lat,lon (.1deg) 775 822 R A cc=222
I’ve not yet figured out exactly what “cc” is, nor what its use might be. That’s still to do. But you can see it’s pretty simple output. The output file name tag (Ts.GHCN.CL and a digit for each of the 6 zones, 1-6), the Station ID, Station Name, text to tell you the latitude and longitude are in 0.1 degree increments, the Latitude and Longitude . I’m not sure what R, A, or CC are all about, having sunk my time into untangling this mess to the point where I can now start to gain understanding… THAT is the problem with Crufty Code and maintenance programming. You can lose a day or three just getting to where you can start the job you set out to do…
So, back to the questions of “how much” and “type” of bogosity.
This is the “highest level” view of the code change. It’s just the question of “how much”. Linux / Unix include a simple command “wc” that does not stand for Water Closet. It stands for “word count”. It counts lines, words, and characters for most any file. So here is the “before” file named “old” and “after” with an “ems” in it.
[chiefio@tubularbells src]$ wc invnt.old.f invnt.ems.f 75 2181 invnt.old.f 55 1464 invnt.ems.f
We (ie. me) chopped out 20 lines of code entirely (which is a bit low, since I added a couple of comments and some blank “white space” for readability. The total is more like 25, or 1/3 the lines.) The character count drops from 2.1 k to 1.4k. OK, doesn’t look like much. But it is the realization that this says roughly 1/3 the code was a Red Herring leading you astray. THAT is the big deal. And this small example makes a better example for posting than a 10 page example where folks would glaze over.
SideBar on Compilation
The original fails on the g95 FORTRAN compiler due to the fname/’ ‘/ initialization on declaration. The f77 error message about an “unimplimented intrinsic” stays the same for both old and new versions. It works, but it complains that something is “untidy”. OK, I still have some clean up to do… But one of the lines I added was a “DATA” line to fix that g95 incompatibility.
Here’s the output of the compile runs:
[chiefio@tubularbells src]$ f77 invnt.ems.f invnt.ems.f: In program `MAIN__': invnt.ems.f:32: warning: CALL SPREAD(IDATA,ITRL,LENGTH,15) ^ Reference to unimplemented intrinsic `SPREAD' at (^) (assumed EXTERNAL) [chiefio@tubularbells src]$ g95 invnt.ems.f In file invnt.ems.f:11 EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1)) 1 2 Warning (101): EQUIVALENCE-ing variable 'itrl' at (1) with 'name' at (2) is nonstandard
So now we can use either compiler and get to chose which warning about bad style we prefer ;-)
So; am I sure nothing changed in the operation of the program?
This table shows the result of running the program (a.out is the default product of the compiler unless you redirect it; so “a.out” is the program.)
First, we run it:
$ cd ../STEP2 $ ../src/a.out Ts.GHCN.CL > Ts.stuff.station.list
Then we look at the product. The old version output was saved as “SAVE…”
$ ls -l -rw-rw-r-- 1 chiefio 578588 Aug 27 18:02 SAVE.Ts.GHCN.CL.station.list -rw-rw-r-- 1 chiefio 578588 Aug 29 11:34 Ts.stuff.station.list
So the first test is simply that the two programs give a file of the same size. That 578588 is the byte count. The second test is to see if they are different in any bit positions at all. The Linux / Unix command “diff” does this. If it finds any difference, it prints it out. If you just get a prompt again, the two files match in every bit.
$ diff SAVE.Ts.GHCN.CL.station.list Ts.stuff.station.list $
So what changed?
There are a couple of ways we could look at this. We could put the two sets of code side by side. Show what lines were changed line by line. Etc. The Linux / Unix program “diff” does a nice job of showing line changes, but does not give any feel for the context (that being suppressed). Listings let you see it all in context, but it’s hard to spot what changed. So we’ll do a bit of both. Let’s start with a look at the actual code left over, the working code, then proceed to a “diff”. The Linux / Unix program “cat” puts things on your screen (short for “conCATenate and print”. I’ll interleave a few comments in the listing. If you are not a programmer type, just read the comments.
If you would like to look at the original, this link will open a new window with the original:
$ cat invnt.ems.f C**** C**** Input File Number: 15 binary station data C**** C**** extracts station information PARAMETER(MONM=13464) CHARACTER*80 TITLE,NAME*36,fname data fname /' '/ INTEGER INFO(9),ITRL(15),ITR1(4) DIMENSION IDATA(MONM) EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1)) C**** INITIALIZE CALL GETARG(1,fname)
Up to this point, we’ve just declared some variables. Places to store bits as we work on them. The “GETARG” line gets a file name “stub” to use for it’s 6 zonal file names. The next bit builds those file names and uses them, one at a time, but in a slighly odd way. The purpose seems to be to let you change the middle bit of the name for different runs as you “tune” the function. There was also a lot of “parameter” setting that I’ve just turned into hard coded numbers. The clear “signature” is of a bit of code designed to be hand tuned to give a ‘best’ result.
The function “INDEX” finds the location of a substring in a string and return the count. This is then used to do the offset for appending the “label” and the digits 1 through 6. It would be easier to just list Ts.GHCN.CL.1 through 6. Yes, there is still some gratuitous complexity to “iron out” here. Then we read in a line of data and move on to processing it (but only after a couple of range checks).
NDOT=INDEX(fname,' ') do 900 nfl=1,6 write(fname(NDOT:NDOT+1),'(A1,I1)') '.',nfl OPEN(15,STATUS='OLD',FILE=fname,FORM='UNFORMATTED') READ(15) INFO,TITLE N1=INFO(1) N2=INFO(9) IF(INFO(6).LT.1800) STOP 'IYRBEG TOO HIGH' LENGTH=N2-N1+1 IF(INFO(4).GT.MONM) STOP 'IYREND TOO LOW' IF(INFO(1).EQ.INFO(7)) go to 899 ! no data
Here we initialize the IDATA array to be full of the “BAD” flag from INFO(7). The original had a bizantine way of doing this that involved a subroutine call and a couple of intermediate variables MBAD and XBAD where XBAD did not do anything.
100 continue DO 103 I=1,MONM IDATA(I)=INFO(7) 103 CONTINUE
The program then reads in more records using a “subroutine” named SPREAD (that is called “Speed Read” in other programs that do the same thing. This is the code that gives the “unimplimented intrinsic” message at compile time. I suspect that I can clean this up a bit too. It just smells like somebody learned a “new trick” and had to use it, even if it was “overkill”. 4 variables and a subroutine to read in a single record of data? The next statement presently says “if (0” is greater than. It orignally had a variable “ID” where the “0” is at. Except ID was set to zero and then never changed… During execution, we never jump to 800 as far as I can tell, so it isn’t clear if we really need this line of code or not. But for now I’ve assumed it might be needed and left it (though put the constant 0 instead of a never changing variable). Then we write out the product record. I changed the variable “name” to be NAME so it matches the data declaration. These lines will show up in the “diff” even though all I did was make the capitalization match so folks are less likely to search for NAME and miss where it is used as “name”. It is very Bad Form to used mixed case for the same variable.
CALL SPREAD(IDATA,ITRL,LENGTH,15) IF(0.GT.ITRL(3)) GO TO 800 WRITE(6,'(A,I9,1X,A30,1x,a,I5,I6,1x,a1,a1,a1,a4,a3)') * fname(1:NDOT+2),ITR1(3),NAME(1:30), * 'lat,lon (.1deg)',(ITR1(i),i=1,2), * NAME(32:32),NAME(31:31),NAME(33:33),' cc=',NAME(34:36) 800 continue N1=ITRL(14) N2=ITRL(15) LENGTH=N2-N1+1 IF (N1.NE.INFO(7)) GOTO 100 899 CLOSE (15) C Still in the "DO 900" loop, so back for another Zone file 900 continue STOP END SUBROUTINE SPREAD(IDATA,ITRL,LENGTH,NTRL) DIMENSION IDATA(LENGTH),ITRL(NTRL) READ(15) IDATA,ITRL RETURN END [chiefio@tubularbells src]$
Here at the bottom we can see the “subroutine” SPREAD. Not much to it. Oh, I added the comment about “still in the DO 900”. Makes it easier to see that at this point you don’t just drop through the “900 Continue” but go back for another zone.
As near as I can tell, the SPREAD subroutine sucks up a buffer of data from the input file, then presents it for use “upstairs”. I don’t really see why this needs to be in a subroutine rather than “in line”, but that will be explored some other day. I decided to kill off the subroutines one at a time 8-)
Pretty straight forward (other than SPREAD) and now I can get down to the work of actually figuring out what it really does, and where its output is used. So what really changed?
The DIFF output
The Linux / Unix program “diff” tells you what lines to change to turn one file into another one. I also will put in some comments about what the change means. The lines with things like 5,14c5,11 are telling you what lines changed. They are directions to the Unix / Linux editor that would recreate one file from the other. You can safely ignore them.
$ diff invnt.old.f invnt.ems.f 2c2 < C**** Input: 15 binary station data --- > C**** Input File Number: 15, binary station data
So here, line 2 was changed into line 2 in the other file. I made the comment mean a bit more. You can now tell verbs from nouns, from adjectives.
5,14c5,11 < PARAMETER(ITRIM=1,IYRBEG=1880,IYREND=3001) < PARAMETER(NRECPY=12, SCL=10., NDPRYR=NRECPY) < PARAMETER(MONM=NDPRYR*(IYREND- IYRBEG+1),NTRL=14+ITRIM) < CHARACTER*6 IDNOAA,IDNOW < CHARACTER*80 TITLE,NAME*36,fname/' '/ < INTEGER INFO(8+ITRIM),ITRL(NTRL),ITR1(4),ID/0/ < DIMENSION PE(MONM),IDATA(MONM),ID1(MONM),PL(2,MONM) < COMMON/DIFF/AVG(14),SD(0:12) < EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1)), < * (ITRL(3),IDNOW) --- > > PARAMETER(MONM=13464) > CHARACTER*80 TITLE,NAME*36,fname > data fname /' '/ > INTEGER INFO(9),ITRL(15),ITR1(4) > DIMENSION IDATA(MONM) > EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1))
We can see that the original has a lot of PARAMETERS set and even calculates MONM from some of them. They can act a bit like documentation (we can guess, for example, that ITRIM is an integer that “trims out” some variables by “one” to avoid “off by one” type errors.). Take a look at the “INTEGER INFO” lines. They make it INFO(8+ITRIM), while I just make it INFO(9). It ends up 9 in either case. I prefer explicit comments for documenting things like this. More likely, the program was written to allow a lot of hand tweeking of parameters to find those that the author liked best. Some of it may be “future development” or “leftovers”. For example, the COMMON block is not used anywhere, nor are the variables in it.
There were also a lot of variables declared that were never used, or were used, in that something was stuffed into them, but they were never used after that (so did nothing – kind of like putting fries down to cook every time you make a burger, but only putting burgers on the plates for “pick up”. But at least they didn’t have to ask “Do you want fries with that?” 8-) Cooking those fries does nothing. So just stop it!) We now have about 1/2 as much stuff to define and figure out.
20,21d17 < NPRINT=0 < NCUR=1
Variables that are declared are often initialized, even if unused… And I added some blank lines like the next one for readability.
23a20 > 25,27c22,23 < N2=INFO(4) < IF(ITRIM.GT.0) N2=INFO(8+ITRIM) < IF(INFO(6).LT.IYRBEG) STOP 'IYRBEG TOO HIGH' --- > N2=INFO(9) > IF(INFO(6).LT.1800) STOP 'IYRBEG TOO HIGH'
ITRIM was a constant 1, so that “IF(ITRIM.GT.0) was always true. Why have it cluttering up the place now that we know the “trim” value is one? And why have N2 assigned INFO(4) when we know it’s just going to get overlain with INFO(8+ITRIM) or even more concisely INFO(9) in the next line? See how Cruft can mislead? There is NO “if” here and no “choice” between 4 OR 9.
My guess is that they wanted to “play” with a “zero” trim value at some point, but never came back to clean up the code… It would be an interesting “Dig Here” to run this program with TRIM=0 and see what changes… What WERE they looking at?… Similarly, the use of IYRBEG instead of 1800 is kind of silly now that we know they are not going to be resetting that date any time soon. OTOH, it does say that they were fooling around with it as a parameter at one point in time…
Next comes a “big chunk”. A LOT of unused variables and assignment statements deleted. The “INTT” subroutine that just initialized 2 arrays (one of which was never used anywhere) was pulled up here “in line” and streamlined. It is now just the “DO 103” loop assigning a single “bad data” flag INFO(7) to each element of IDATA. Much easier to figure out than a subroutine with 2 arrays and one of them nonsensical. And since PE is not used, the “DO 210” loop goes with it. Oh, and the whole INFO(7), MBAD, XBAD as 3 names with 2 data types for the same thing evaporates as well. Oh, and taking out the “Commented out:” write statement lets you take away any variables “used” only in it as well.
30c26,33 < MBAD=INFO(7) --- > IF(INFO(1).EQ.INFO(7)) go to 899 ! no data > 100 continue > DO 103 I=1,MONM > IDATA(I)=INFO(7) > 103 CONTINUE > > CALL SPREAD(IDATA,ITRL,LENGTH,15) > IF(0.GT.ITRL(3)) GO TO 800 32,46d34 < XBAD=MBAD < IF(INFO(1).EQ.MBAD) go to 899 ! no data < 100 CALL INTT(IDATA,MONM,MBAD,PE,XBAD) < CALL SPREAD(IDATA,ITRL,LENGTH,NTRL) < IF(ID.GT.ITRL(3)) GO TO 800 < < IYR0=(N1-1)/NDPRYR < IYRL=(N2+NRECPY-1)/NDPRYR < N1JAN=IYR0*NRECPY+1 < C WRITE(6,*) ITRL(3),INFO(6)+IYR0,INFO(6)+IYRL-1 < DO 210 N=N1JAN,MONM < PE(N)=IDATA(N) < 210 ID1(N)=NINT(SCL*IDATA(N)) < MISS=NINT(SCL*XBAD) < M=0
Here is where I just changed the variable name from “name” to “NAME”. Why? searching code for variables that are unused is easier if you stick to one case for a variable… AND it’s good form. One early era of FORTRAN was all caps, newer versions allow lower case. You are making assumptions about compiler level when you mix a little lowercase into your mostly uppercase old program… Though NPRINT is completely unused, so deleted. 800 becomes a “continue” since then it is much clearer when that next code gets executed (and the whole N1 and N2 assignments get simplified.) Then NCUR is removed, unused as it is.
48c36 < * fname(1:NDOT+2),ITR1(3),name(1:30), --- > * fname(1:NDOT+2),ITR1(3),NAME(1:30), 50,53c38,42 < * name(32:32),name(31:31),name(33:33),' cc=',name(34:36) < NPRINT=NPRINT+1 < 800 N1=ITRL(14) < IF(ITRIM.GT.0) N2=ITRL(14+ITRIM) --- > * NAME(32:32),NAME(31:31),NAME(33:33),' cc=',NAME(34:36) > > 800 continue > N1=ITRL(14) > N2=ITRL(15) 55d43 < NCUR=NCUR+1 57a46 > C Still in the "DO 900" loop, so back for another Zone file
I added the “Do End” comment for clarity.
And here is the subroutine I deleted. An entire subroutine just to initialize an array. The PE array is not used anywhere, and once it is removed, and XBAD with it, this becomes one “Do Loop” initializing one array IDATA with a “bad data flag”.
65,73d53 < RETURN < END < < SUBROUTINE INTT(IDATA,MONM,MBAD,PE,XBAD) < DIMENSION IDATA(MONM),PE(MONM) < DO 333 I=1,MONM < IDATA(I)=MBAD < PE(I)=XBAD < 333 CONTINUE [chiefio@tubularbells src]$
Now take a moment to think about this.
Would you call that a well polished bit of code? Well thought out, with structured design and a good thorough run through the optimization Quality Control process?
Would you be willing to bet the world economy on GIStemp and the computer models that use its product, given that level of “polish”?
Now you know why I’m slogging through all this dreck, and why you all ought to go find a maintenance programmer somewhere and buy them a cherry slurpy (or a good beer ;-)
OK, enough venting… Back to the code mine…
Work’n in a code mine,… goin down down down …