GIStemp invnt.f A Sympathy Plea

Star Trek, Dead Red Shirt from Obsession episode

Unclear code? Can I help?

Original Image.

It is believed that use of this image to educate as to the meaning of “Red Shirt” is fair use.

The Nature Of The Beast

This posting has some technical content, but it is, I hope, “pitched” so that any one can “get it”. It is, for me, a plea for sympathy. Not just for me, but for all the “maintenance programmers” of the world. By choice, I’ve taken on GIStemp. Most maintenance programmers get handed a “bucket of bits” and told “add a program to do FOO” or “Jimmy wants BAR changed to bar” and just dives into what ever bucket of spit they were pointed at. YES, that is the lot of the “maintenance programmer”.

The “boss” does not want to hear that it’s a load of crap. Odds are he, or one of his friends, managed the creation of it (or, horrors, wrote it himself!). You get no sympathy when you say “It will take a week, maybe two” (really having no idea at all if it’s 10 minutes or 10 years to get it done. It’s hard to gauge the size of a Code Black Hole Of Death until you have been sucked in beyond the event horizon…) The inevitable answer is “Heck, that ought to be a one line change! WHY, In My Day mumble drone…”. I know this for the simple reason that I have BEEN the management saying “It ought to be…”

One of the rules in shops I have run is the “bull shit rule”. ANYONE can call “bull shit!”, but then they have to prove it. This has resulted, at times, in my showing a junior sysprog that it really WAS a “one line change” that took only 10 minutes. It has also resulted in my calling “Bull Shit” and spending a few months trying to figure out just how it is that a one line change can require 20000 lines of code in three languages… After a few months, folks are very reluctant to call “Bull Shit!”, both the programmers (when told it ought to take a day) and the managers (i.e. me) when told my “one day project” ought to take 6 months and 3 staff adds. It’s a great discipline tool, for both sides. And it’s fun to watch!

OK, what does this have to do with AGW (be it Anthropogenic Global Warming or Airport Global Warming ;-), the GIStemp software, or “42”?

Pretty simple, really: I’m both “grunt” and “management” on the “WTF does GIStemp Really Do, and How?” project. So I’m the maintenance programmer. And I’m swimming one day in decent code, and the next day in a bucket of warm spit. And, unfortunately, there is no one with whom I can “share” (i.e. solicit sympathy). So this posting is going to look at one small program that I just wormed my way through. The crap it started as, and where it ended up. This is just one small percent of GIStemp.

What I expect folks to get from this is simple: a) GIStemp has some real crap in it. b) We ought not to be betting the fate of the world on code written this way. c) Maintenance programmers need a hug.

OK, what program, and why?

I’ve shown that GIStemp acts as an amplifier of the “warming signal” in the data through all the “Temperature” steps (i.e. before temperatures are turned into “anomallies”) and that this signal is only present in the Northern Hemisphere winter and short lived thermometers. Something CO2 can not do. I have pointed a strong finger of probability at Airports as the cause, and demonstarted that adding a pot load of thermometers at Tropical Airport Vacation Destinations along with the migration of Siberian latitude thermometers to the latitude of Italy (all inside one GIStemp zone, so not caught by the zonal steps) probably are the cause.

The result has been a chorus from the “warmers” that “The Anomaly Will Save Us!” and “Use The Anomaly Luke!”… All that crap in STEP0, STEP1, and STEP2 early phases will be fixed by STEP2 the later phase and STEP3!! (STEP4_5 can’t save anyone, since it is Sea Surface Temperature Anomaly Map. All it does is merge the GIStemp anomaly map with the SST map delivered from Hadley as a “done deal”. So if the land data are not warming by STEP3 end, there is no global warming. GIStemp even states that STEP4_5 is optional.) So I headed off to do a bit more looking at “The Anomaly Steps”. I’d done a “once over” before, but thought I ought to be a bit more detailed this time. I knew what the pieces did, but some of the code was bit, um, unclear to me.

Can I Beam To The Surface Too?

Re-read that last line. “Some of the code was a bit, um, unclear to me.”

That is a maintenance programmer’s “Red Flag and Charging Bull” warning. You look at something. You know what it does. The Guzintas and the Guzoutas look right (i.e. inputs and outputs match ‘spec’ or implied spec via what other programs are looking for) and you say to yourself “I’ll come back to that one just a bit later and really get to understand it. There is no problem at this time..” Bring up the French Horns as you have just donned a Red Shirt and are asking the Captain if he needs a Volunteer on this Away Mission and no, you don’t need billing in the credits…..

So I “came back to one” in the “anomaly phase” (that is destined to save the planet from The March Of The Thermometers to tropical paradises all over the planet.) It is the program “invnt.f” that does a minor reporting job in the process. But it bugged me. It is run a couple of times in STEP2 and it was, well, just a bit Bizantine when I looked at it on the “one pass” fly by. Things just didn’t add up. There were holes in my understanding map of it, where some variables didn’t have a “use” mapped to them. So I returned. After all, this is part of the code that “Will Save Us from The March Of The Thermometers!!”

Well, What Happened?

OK, the short form is that I found a lot of something called “Dead Code”. That is, bits of program that do nothing. Either you can’t get to them (they never execute) or they are simply doing nothing of value. A simple non-GIStemp example?

 5 x=10
10 Goto 30
20 Print “The world can be saved by this formula: $FORMULA “
30 Print “We are all doomed!”
50 STOP

Clearly, line 20 is never reached.. It is “Dead Code”. Line 10 does a “goto 30” and jump over it, while nothing ever comes back. Less clearly, X is given a value, but nothing is ever done with it. It is a waste of time and space, doing nothing. The effective, actual program is:

30 Print “We are all doomed!”
50 STOP

Even the “10 goto 30” is “dead code” in a way in that once you prune out the actual dead code, it contributes nothing. (Technically it isn’t “dead code” in that it does execute when the program is run, but it is junk code. Ah, the nuances of mouldy code, like fine stinky cheeses… only without the flavor…)

Now, for a maintenance programmer, Dead Code is especially problematic. First off, if it has variable names in them, you are reserving part of your brain for what they do, and what values they hold. Second, if they have formulas, you are trying to figure out how these tie in to all the other parts of the program. CLEARLY it all must hang together somehow! It MUST have meaning and you MUST find it. Yet it may all be mindless dreck and error laden misdirection; adding nothing but confusion.

Dead Code has no meaning. It is quicksand designed to suck away brain capacity needed elsewhere… It is the Special Forces guy passing as a peasant who will sell you a load of broken understandings and send you into a swamp with promises of a clear road.

So what did I find in “invnt.f” when I looked really closely? I found a lot of code that did nothing, more code that hid what it did in obscure ways. And some code that looked like it did something important, but never did print out a value, nor participate with any other code (i.e. Does Not Play Well With Others.). How much? Oh, about 1/4 to 1/3. It depends a bit on how you do the math. (But doesn’t everything? Especially temperature averages. ;-)

We’ll look at that in a bit more depth later (for now, just let that size sink in…).

What does “invnt.f” do? It produces an output file that I think is an “inventory” of stations, by latitude Zone, with some minor information about them.

Here are a couple of selected output lines to give you a ‘feel’ for it. I’ve “wrapped” the lines right after the name field so WordPress will treat them nicely, but they really are just one lone line per station:


Ts.GHCN.CL.1 200460003 GMO IM.E.T.                    
lat,lon (.1deg)  806   580 R A cc=222
Ts.GHCN.CL.1 200690003 OSTROV VIZE                    
lat,lon (.1deg)  795   770 R A cc=222
Ts.GHCN.CL.1 202740000 OSTROV UEDINE                  
lat,lon (.1deg)  775   822 R A cc=222

I’ve not yet figured out exactly what “cc” is, nor what its use might be. That’s still to do. But you can see it’s pretty simple output. The output file name tag (Ts.GHCN.CL and a digit for each of the 6 zones, 1-6), the Station ID, Station Name, text to tell you the latitude and longitude are in 0.1 degree increments, the Latitude and Longitude . I’m not sure what R, A, or CC are all about, having sunk my time into untangling this mess to the point where I can now start to gain understanding… THAT is the problem with Crufty Code and maintenance programming. You can lose a day or three just getting to where you can start the job you set out to do…

So, back to the questions of “how much” and “type” of bogosity.

This is the “highest level” view of the code change. It’s just the question of “how much”. Linux / Unix include a simple command “wc” that does not stand for Water Closet. It stands for “word count”. It counts lines, words, and characters for most any file. So here is the “before” file named “old” and “after” with an “ems” in it.

[chiefio@tubularbells src]$ wc invnt.old.f invnt.ems.f
     75    2181 invnt.old.f
     55    1464 invnt.ems.f

We (ie. me) chopped out 20 lines of code entirely (which is a bit low, since I added a couple of comments and some blank “white space” for readability. The total is more like 25, or 1/3 the lines.) The character count drops from 2.1 k to 1.4k. OK, doesn’t look like much. But it is the realization that this says roughly 1/3 the code was a Red Herring leading you astray. THAT is the big deal. And this small example makes a better example for posting than a 10 page example where folks would glaze over.

SideBar on Compilation

The original fails on the g95 FORTRAN compiler due to the fname/’ ‘/ initialization on declaration. The f77 error message about an “unimplimented intrinsic” stays the same for both old and new versions. It works, but it complains that something is “untidy”. OK, I still have some clean up to do… But one of the lines I added was a “DATA” line to fix that g95 incompatibility.

Here’s the output of the compile runs:


[chiefio@tubularbells src]$ f77 invnt.ems.f
invnt.ems.f: In program `MAIN__':
invnt.ems.f:32: warning:
         CALL SPREAD(IDATA,ITRL,LENGTH,15)
              ^
Reference to unimplemented intrinsic `SPREAD' at (^) (assumed EXTERNAL)


[chiefio@tubularbells src]$ g95 invnt.ems.f 
In file invnt.ems.f:11

      EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1))
                   1       2
Warning (101): EQUIVALENCE-ing variable 'itrl' at (1) with 'name' at (2) is nonstandard

 

So now we can use either compiler and get to chose which warning about bad style we prefer ;-)

So; am I sure nothing changed in the operation of the program?

Yes.

This table shows the result of running the program (a.out is the default product of the compiler unless you redirect it; so “a.out” is the program.)

First, we run it:


$ cd ../STEP2
$ ../src/a.out Ts.GHCN.CL > Ts.stuff.station.list

Then we look at the product. The old version output was saved as “SAVE…”

$ ls -l
-rw-rw-r--    1 chiefio 578588 Aug 27 18:02
SAVE.Ts.GHCN.CL.station.list
-rw-rw-r--    1 chiefio 578588 Aug 29 11:34 
Ts.stuff.station.list

So the first test is simply that the two programs give a file of the same size. That 578588 is the byte count. The second test is to see if they are different in any bit positions at all. The Linux / Unix command “diff” does this. If it finds any difference, it prints it out. If you just get a prompt again, the two files match in every bit.

$ diff SAVE.Ts.GHCN.CL.station.list Ts.stuff.station.list
$ 

So what changed?

There are a couple of ways we could look at this. We could put the two sets of code side by side. Show what lines were changed line by line. Etc. The Linux / Unix program “diff” does a nice job of showing line changes, but does not give any feel for the context (that being suppressed). Listings let you see it all in context, but it’s hard to spot what changed. So we’ll do a bit of both. Let’s start with a look at the actual code left over, the working code, then proceed to a “diff”. The Linux / Unix program “cat” puts things on your screen (short for “conCATenate and print”. I’ll interleave a few comments in the listing. If you are not a programmer type, just read the comments.

If you would like to look at the original, this link will open a new window with the original:

https://chiefio.wordpress.com/2009/03/08/gistemp-step2_invnt/

$ cat invnt.ems.f 
C****   
C**** Input File Number: 15  binary station data
C****
C**** extracts station information

      PARAMETER(MONM=13464)
      CHARACTER*80 TITLE,NAME*36,fname
      data fname /' '/
      INTEGER INFO(9),ITRL(15),ITR1(4)
      DIMENSION IDATA(MONM)
      EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1))
C**** INITIALIZE
      CALL GETARG(1,fname)

Up to this point, we’ve just declared some variables. Places to store bits as we work on them. The “GETARG” line gets a file name “stub” to use for it’s 6 zonal file names. The next bit builds those file names and uses them, one at a time, but in a slighly odd way. The purpose seems to be to let you change the middle bit of the name for different runs as you “tune” the function. There was also a lot of “parameter” setting that I’ve just turned into hard coded numbers. The clear “signature” is of a bit of code designed to be hand tuned to give a ‘best’ result.

The function “INDEX” finds the location of a substring in a string and return the count. This is then used to do the offset for appending the “label” and the digits 1 through 6. It would be easier to just list Ts.GHCN.CL.1 through 6. Yes, there is still some gratuitous complexity to “iron out” here. Then we read in a line of data and move on to processing it (but only after a couple of range checks).

      NDOT=INDEX(fname,' ')

      do 900 nfl=1,6
      write(fname(NDOT:NDOT+1),'(A1,I1)') '.',nfl
      OPEN(15,STATUS='OLD',FILE=fname,FORM='UNFORMATTED')
      READ(15) INFO,TITLE

      N1=INFO(1)
      N2=INFO(9)
      IF(INFO(6).LT.1800) STOP 'IYRBEG TOO HIGH'
      LENGTH=N2-N1+1
      IF(INFO(4).GT.MONM) STOP 'IYREND TOO LOW'
      IF(INFO(1).EQ.INFO(7)) go to 899 ! no data
  

Here we initialize the IDATA array to be full of the “BAD” flag from INFO(7). The original had a bizantine way of doing this that involved a subroutine call and a couple of intermediate variables MBAD and XBAD where XBAD did not do anything.

100 continue
       DO 103 I=1,MONM
       IDATA(I)=INFO(7)
 103 CONTINUE

The program then reads in more records using a “subroutine” named SPREAD (that is called “Speed Read” in other programs that do the same thing. This is the code that gives the “unimplimented intrinsic” message at compile time. I suspect that I can clean this up a bit too. It just smells like somebody learned a “new trick” and had to use it, even if it was “overkill”. 4 variables and a subroutine to read in a single record of data? The next statement presently says “if (0” is greater than. It orignally had a variable “ID” where the “0” is at. Except ID was set to zero and then never changed… During execution, we never jump to 800 as far as I can tell, so it isn’t clear if we really need this line of code or not. But for now I’ve assumed it might be needed and left it (though put the constant 0 instead of a never changing variable). Then we write out the product record. I changed the variable “name” to be NAME so it matches the data declaration. These lines will show up in the “diff” even though all I did was make the capitalization match so folks are less likely to search for NAME and miss where it is used as “name”. It is very Bad Form to used mixed case for the same variable.


      CALL SPREAD(IDATA,ITRL,LENGTH,15)
      IF(0.GT.ITRL(3)) GO TO 800

      WRITE(6,'(A,I9,1X,A30,1x,a,I5,I6,1x,a1,a1,a1,a4,a3)') 
     *  fname(1:NDOT+2),ITR1(3),NAME(1:30),
     *  'lat,lon (.1deg)',(ITR1(i),i=1,2),
     * NAME(32:32),NAME(31:31),NAME(33:33),' cc=',NAME(34:36)

  800 continue
      N1=ITRL(14)
      N2=ITRL(15)
      LENGTH=N2-N1+1
      IF (N1.NE.INFO(7)) GOTO 100
  899 CLOSE (15)
C     Still in the "DO 900" loop, so back for another Zone file 
  900 continue
      STOP
      END

      SUBROUTINE SPREAD(IDATA,ITRL,LENGTH,NTRL)
      DIMENSION IDATA(LENGTH),ITRL(NTRL)
      READ(15) IDATA,ITRL
      RETURN
      END
[chiefio@tubularbells src]$ 

Here at the bottom we can see the “subroutine” SPREAD. Not much to it. Oh, I added the comment about “still in the DO 900”. Makes it easier to see that at this point you don’t just drop through the “900 Continue” but go back for another zone.

As near as I can tell, the SPREAD subroutine sucks up a buffer of data from the input file, then presents it for use “upstairs”. I don’t really see why this needs to be in a subroutine rather than “in line”, but that will be explored some other day. I decided to kill off the subroutines one at a time 8-)

Pretty straight forward (other than SPREAD) and now I can get down to the work of actually figuring out what it really does, and where its output is used. So what really changed?

The DIFF output

The Linux / Unix program “diff” tells you what lines to change to turn one file into another one. I also will put in some comments about what the change means. The lines with things like 5,14c5,11 are telling you what lines changed. They are directions to the Unix / Linux editor that would recreate one file from the other. You can safely ignore them.

$ diff invnt.old.f invnt.ems.f 
2c2
< C**** Input: 15  binary station data
---
> C**** Input File Number: 15,  binary station data

So here, line 2 was changed into line 2 in the other file. I made the comment mean a bit more. You can now tell verbs from nouns, from adjectives.


5,14c5,11
<       PARAMETER(ITRIM=1,IYRBEG=1880,IYREND=3001)
<       PARAMETER(NRECPY=12, SCL=10., NDPRYR=NRECPY)
<       PARAMETER(MONM=NDPRYR*(IYREND-
IYRBEG+1),NTRL=14+ITRIM)
<       CHARACTER*6 IDNOAA,IDNOW
<       CHARACTER*80 TITLE,NAME*36,fname/' '/
<       INTEGER INFO(8+ITRIM),ITRL(NTRL),ITR1(4),ID/0/
<       DIMENSION PE(MONM),IDATA(MONM),ID1(MONM),PL(2,MONM)
<       COMMON/DIFF/AVG(14),SD(0:12)
<       EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1)),
<     *  (ITRL(3),IDNOW)
---
>
>       PARAMETER(MONM=13464)
>       CHARACTER*80 TITLE,NAME*36,fname
>       data fname /' '/
>       INTEGER INFO(9),ITRL(15),ITR1(4)
>       DIMENSION IDATA(MONM)
>       EQUIVALENCE (ITRL(5),NAME),(ITRL(1),ITR1(1))

We can see that the original has a lot of PARAMETERS set and even calculates MONM from some of them. They can act a bit like documentation (we can guess, for example, that ITRIM is an integer that “trims out” some variables by “one” to avoid “off by one” type errors.). Take a look at the “INTEGER INFO” lines. They make it INFO(8+ITRIM), while I just make it INFO(9). It ends up 9 in either case. I prefer explicit comments for documenting things like this. More likely, the program was written to allow a lot of hand tweeking of parameters to find those that the author liked best. Some of it may be “future development” or “leftovers”. For example, the COMMON block is not used anywhere, nor are the variables in it.

There were also a lot of variables declared that were never used, or were used, in that something was stuffed into them, but they were never used after that (so did nothing – kind of like putting fries down to cook every time you make a burger, but only putting burgers on the plates for “pick up”. But at least they didn’t have to ask “Do you want fries with that?” 8-) Cooking those fries does nothing. So just stop it!) We now have about 1/2 as much stuff to define and figure out.

20,21d17
<      NPRINT=0
<       NCUR=1

Variables that are declared are often initialized, even if unused… And I added some blank lines like the next one for readability.

23a20
> 
25,27c22,23
<      N2=INFO(4)
<       IF(ITRIM.GT.0) N2=INFO(8+ITRIM)
<       IF(INFO(6).LT.IYRBEG) STOP 'IYRBEG TOO HIGH'
---
>       N2=INFO(9)
>       IF(INFO(6).LT.1800) STOP 'IYRBEG TOO HIGH'

ITRIM was a constant 1, so that “IF(ITRIM.GT.0) was always true. Why have it cluttering up the place now that we know the “trim” value is one? And why have N2 assigned INFO(4) when we know it’s just going to get overlain with INFO(8+ITRIM) or even more concisely INFO(9) in the next line? See how Cruft can mislead? There is NO “if” here and no “choice” between 4 OR 9.

My guess is that they wanted to “play” with a “zero” trim value at some point, but never came back to clean up the code… It would be an interesting “Dig Here” to run this program with TRIM=0 and see what changes… What WERE they looking at?… Similarly, the use of IYRBEG instead of 1800 is kind of silly now that we know they are not going to be resetting that date any time soon. OTOH, it does say that they were fooling around with it as a parameter at one point in time…

Next comes a “big chunk”. A LOT of unused variables and assignment statements deleted. The “INTT” subroutine that just initialized 2 arrays (one of which was never used anywhere) was pulled up here “in line” and streamlined. It is now just the “DO 103” loop assigning a single “bad data” flag INFO(7) to each element of IDATA. Much easier to figure out than a subroutine with 2 arrays and one of them nonsensical. And since PE is not used, the “DO 210” loop goes with it. Oh, and the whole INFO(7), MBAD, XBAD as 3 names with 2 data types for the same thing evaporates as well. Oh, and taking out the “Commented out:” write statement lets you take away any variables “used” only in it as well.

30c26,33
<       MBAD=INFO(7)
---
>       IF(INFO(1).EQ.INFO(7)) go to 899 ! no data
>   100 continue
>       DO 103 I=1,MONM
>        IDATA(I)=INFO(7)
>  103  CONTINUE
> 
>       CALL SPREAD(IDATA,ITRL,LENGTH,15)
>       IF(0.GT.ITRL(3)) GO TO 800
32,46d34
<       XBAD=MBAD
<      IF(INFO(1).EQ.MBAD) go to 899 ! no data
<   100 CALL INTT(IDATA,MONM,MBAD,PE,XBAD)
<       CALL SPREAD(IDATA,ITRL,LENGTH,NTRL)
<       IF(ID.GT.ITRL(3)) GO TO 800
< 
<       IYR0=(N1-1)/NDPRYR
<      IYRL=(N2+NRECPY-1)/NDPRYR
<      N1JAN=IYR0*NRECPY+1
< C     WRITE(6,*) ITRL(3),INFO(6)+IYR0,INFO(6)+IYRL-1
<      DO 210 N=N1JAN,MONM
<      PE(N)=IDATA(N)
<   210 ID1(N)=NINT(SCL*IDATA(N))
<      MISS=NINT(SCL*XBAD)
<      M=0

Here is where I just changed the variable name from “name” to “NAME”. Why? searching code for variables that are unused is easier if you stick to one case for a variable… AND it’s good form. One early era of FORTRAN was all caps, newer versions allow lower case. You are making assumptions about compiler level when you mix a little lowercase into your mostly uppercase old program… Though NPRINT is completely unused, so deleted. 800 becomes a “continue” since then it is much clearer when that next code gets executed (and the whole N1 and N2 assignments get simplified.) Then NCUR is removed, unused as it is.

48c36
<      *  fname(1:NDOT+2),ITR1(3),name(1:30),
---
>      *  fname(1:NDOT+2),ITR1(3),NAME(1:30),
50,53c38,42
<     * name(32:32),name(31:31),name(33:33),' cc=',name(34:36)
<      NPRINT=NPRINT+1
<  800 N1=ITRL(14)
<       IF(ITRIM.GT.0) N2=ITRL(14+ITRIM)
---
>      * NAME(32:32),NAME(31:31),NAME(33:33),' cc=',NAME(34:36)
> 
>   800 continue
>       N1=ITRL(14)
>       N2=ITRL(15)
55d43
<      NCUR=NCUR+1
57a46
> C     Still in the "DO 900" loop, so back for another Zone file 

I added the “Do End” comment for clarity.

And here is the subroutine I deleted. An entire subroutine just to initialize an array. The PE array is not used anywhere, and once it is removed, and XBAD with it, this becomes one “Do Loop” initializing one array IDATA with a “bad data flag”.

65,73d53
<      RETURN
<       END
< 
<      SUBROUTINE INTT(IDATA,MONM,MBAD,PE,XBAD)
<      DIMENSION IDATA(MONM),PE(MONM)
<       DO 333 I=1,MONM
<         IDATA(I)=MBAD
<      PE(I)=XBAD
<  333  CONTINUE
[chiefio@tubularbells src]$ 

In Conclusion

Now take a moment to think about this.

Would you call that a well polished bit of code? Well thought out, with structured design and a good thorough run through the optimization Quality Control process?

Would you be willing to bet the world economy on GIStemp and the computer models that use its product, given that level of “polish”?

Now you know why I’m slogging through all this dreck, and why you all ought to go find a maintenance programmer somewhere and buy them a cherry slurpy (or a good beer ;-)

OK, enough venting… Back to the code mine…

Work’n in a code mine,… goin down down down …

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific, Favorites, GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

13 Responses to GIStemp invnt.f A Sympathy Plea

  1. Ellie in Belfast says:

    Wow. I really get it. Consider yourself given a big virtual hug.

  2. Roger Sowell says:

    I love that you used the same word “bogosity” as we did years ago for really bad data and really bad programming. I was an engineer/programmer for a while, and we created a Bogosity Index with which to rank the various data sets that came into our shop. That was in the context of a multi-client study with several hundred data sets incoming.

    “So, back to the questions of “how much” and “type” of bogosity.”

    Btw…if you have not seen it yet, the California Attorney General has an IPCC cheerleading website for AGW…stating that the fate of the world really does hang on this code you are dissecting. This is stuff to make one shudder.

    http://www.ag.ca.gov/globalwarming/contrarians.php

    Just one of the “gems” in this cheerleading is this “. . .over several decades of model development, they [the global climate models] have consistently provided a robust and unambiguous picture of significant climate warming in response to increasing greenhouse gases.”

    One must wonder exactly how models built like the ones you are digging through can do all that.

    Ed, the work you are doing is worthy of the highest honor this nation can bestow on a civilian. Meanwhile, here’s a virtual beer for you!

  3. Bishop Hill says:

    Great post. This is important stuff. I hope you are going to publish it at some point.

  4. E.M.Smith says:

    @all:

    Thanks! I feel better now… ;-)

    (I do go through a bit of a cyclical process in this stuff. I enter it centered and neutral, gradually descend into mild depression / revulsion, then take a break – with rebound mania – and settle out back at “centered” again. Ready for another round.)

    @Bishop Hill:

    Well, to some extent, this web site is publishing. Per “peer reviewed”: I’d be glad to at some point, but I don’t have enough experience at it to go it alone. Until then, I’m putting what I find here. I’ll happily take 3rd author on some paper with an experienced partner. Heck, I’d be thrilled to just show up as a foot note in someone’s paper that gave a HatTip to an idea lifted from here. My goal is to get truth known, not gain glory.

    @Roger:

    The more things change, the more they stay the same ;-)

    @ Ellie:

    Giggle. That was nice!

  5. JLKrueger says:

    Methinks you’re working too hard.

    Ts.GHCN.CL.1 200460003 GMO IM.E.T.
    lat,lon (.1deg) 806 580 R A cc=222
    Ts.GHCN.CL.1 200690003 OSTROV VIZE
    lat,lon (.1deg) 795 770 R A cc=222
    Ts.GHCN.CL.1 202740000 OSTROV UEDINE
    lat,lon (.1deg) 775 822 R A cc=222
    I’ve not yet figured out exactly what “cc” is, nor what its use might be. That’s still to do. But you can see it’s pretty simple output. The output file name tag (Ts.GHCN.CL and a digit for each of the 6 zones, 1-6), the Station ID, Station Name, text to tell you the latitude and longitude are in 0.1 degree increments, the Latitude and Longitude . I’m not sure what R, A, or CC are all about, having sunk my time into untangling this mess to the point where I can now start to gain understanding…

    From the legend for the txt file:
    cc=country-code
    P: R/S/U=rural/sm.town/urban (population)
    N: 1/2/3=dark/dim/bright (satellite light data 1995, only near cont. US)
    B: A/B/C=dark/dim/bright (part of GHCN’s inventory file)

    Indeed when you check the stations, the 222 is Russia, mostly Siberia.

    You most assuredly have my sympathy!

  6. pyromancer76 says:

    So glad that readers like JLKrueger, and, earlier, Ellie in Belfast, others, can help you. And Roger Sowell can offer sympathy — been there, done that too and it’s a @#$%&. Yes, today blogs are publishing with very knowledgable peer reviewers (and others like me who value logical thinking), AND you must publish elsewhere. A detective novel format is inviting and piques (peaks) the curiosity. Your style is like this anyway.

    I like the idea that you were stopping for dinner and then returning in the wee hours of the a.m. working to save Western Civilization by exposing what dunderheads (or malevolent fools) lead many of our important organizations. Many citizen hugs for your efforts as both boss and grunt on this project. You’re also an entertaining writer: “It’s hard to gauge the size of a Code Black Hole Of Death until you have been sucked in beyond the event horizon…”

  7. E.M.Smith says:

    @JL Krueger:

    Thanks for that. I ought to have picked up on country code even without a heading. Maybe I have been pushing a bit to hard. 5 hours sleep last night, back at it again now… But I hear the politicians talking about Copenhagen with glee in their voice and a hand in my pocket and a little extra push seems more important than sleeping in…

    Then I run into a code brain bomb like this one was and it’s just infuriating that this level of “care” and “quality” are what folks are trusting with complete faith. Then I know I have to show folks just what it looks like; since I figure maybe 1/2 dozen folks on the planet have actually looked at that piece of code in depth. (Original writer, Me, Ravenswood doing a port to Python, maybe a maintenance programmer at GISS… I’m running out here…) and it really needs to be seen more broadly (but in a way even non-programmers can “get”). And the result is another posting.

    @pyromancer76

    I think my “style” comes from having spent way too much time watching movies and having too active an imagination. It is kind of “stuck on” (like a lot of my brain… The present theory about Aspergers is that the “on” switch gets stuck; if only a little, you are ‘high function’; if too much ‘Rainman’. I’m in the ‘borderline’ group: high function and I can turn it off, with some effort.)

    When I’m up to my analytical lobes in something like this, the visual and humor parts of the brain feel left out and want to play… so they wander off to the movie theatre memory vault department and start kicking around what would apply to the situation and what just tickles (their fancy, or their plain ;-)

    And thanks to way too many hours watching Road Runner, Bugs Bunny, etc. I’m just not happy without sound effects… so I’ll sometimes add them (just for myself mostly these days, but occasionally in conversation – causes some folks a bit of a startle)… “So I opened this program ‘creeeeeaak’ and blew the dust off it ‘whoosssspppft. Cough Cough HACK!’ there were chunks of dead code everywhere, and me, fresh out of wooden stakes! ‘good evenenink.’ Gads, it was sundown and one of them was starting to stir! ”

    Yes, that really is the way I interact with the world a lot of the time. Can’t stop it, might as well enjoy it 8-)

    Took me a while to learn to hide it among the more somber / dour parts of “society”. Didn’t realize until about High School graduation that it was “different” from how other folks experienced the world. So I’ve had times I was at a corporate function of some sort, 3 piece suit and all on some occasions, doing the somber and staid “work the crowd” thing… and there in the back of the video department I’ve got animated slapstick going on…

    Polite dour nod to corporate president, “So how is your marketing plan working?” discussion of mostly success, one side note about a new product launch that is having difficulties making schedule… (Queue video: Old biplane being catapult launched too early, pilot running alongside trying to get in… wwrrrsszzzzrooom SPLAT!) “I see.” – polite smile, just a bit too big… are folks wondering why I’m smiling… well at least I didn’t say “SPLAT!” out loud this time…

    I think it is related to what “normals” see when they dream. The same sort of process. But it’s always accessible, even when awake. If reality is interesting enough, the internal visuals fade and the reality dominates. But when the reality isn’t “enough juice”, well… there’s always the vault… “Clikity Clickity 12, clikity click 48, click 73, clak; Creeeak. Womphft. Wow, look at THAT! pad pad pad flomphf. Start the projector!!!”

    I’m just glad you like the results.

    And now you also know why things like this:

    https://chiefio.wordpress.com/2009/07/04/business-panic-of-33ad-things-never-change/

    Are put up. When I see the present collapse of the financial system, I’m also seeing it in a slightly “Funny Thing Happened on the Way To the Forum” way, and the idea that it really DID happen that way in 33 AD is just too delicious to pass up!

    And, of course, it also explains why I had to do a posting with sound effects in links and with a Marvin The Martian theme:

    https://chiefio.wordpress.com/2009/03/05/illudium/

    because that really is what I hear (sometimes as faint echos) as I read some paper or other about GIStemp…

  8. Ellie in Belfast says:

    I’m sorry now that I deleted this from my comment last night:
    Information vital to the survival of the world as we know it is burried in the code. Few know how to retrieve it. ….This is our most desperate hour. Help us, E.M Smith; you’re our only hope.
    (with apologies to a certain film).

  9. Duncan says:

    Man, that looks like the kinda code I maintain. Not that I’ve ever written anything like that…

    I bet the common area comes from whatever function was copy/saved to start this function. Some of the unused variables might might be carry-overs too.

    I understand them setting beginning year IYRBEG to 1800, but why set ending year to 3001? Other than being black eyed peas fans?

    Seems like it adds an extra 12000 to the array sizes for pe, idata, id1 and pl – and unless we already have temperature records for the next millenium there’s no data that could hit those.

    In fact, if there were data in those spots it would probably be messing things up somewhere.

  10. E.M.Smith says:

    Ellie in Belfast
    I’m sorry now that I deleted this from my comment last night:

    Information vital to the survival of the world as we know it is burried in the code. Few know how to retrieve it. ….This is our most desperate hour. Help us, E.M Smith; you’re our only hope.
    (with apologies to a certain film).

    These are not the programmers you are looking for… Move along, move along… (slight wave of hand)…

    Chuckle… You know, Ellie, I think you may spend some time in “my world”… Bright, slightly overactive imagination? Perhaps even a tendency to excessive precision and carefulness about order? A bit compulsive about completion? Then the brief trip “out there” to regain center and balance in life?

    @Duncan:

    Yeah, I thought 3001 was a bit cheeky too! (Hey, man, it’s time to fix the Y2k “bug”, hell, lets fix the y3k bug TOO! I’m sure MY code will still be running 1000 years from now!!)

    Yeah, it “bloats up” everything. If it were me, I’d pass in the present year+1 at run time… but that’s just me…

    BTW, in returning to PApars.f it has an odd “echo” of this program but many of the variables make sense in it. My best guess as of right now is that PApars.f was cut down to make a “quick and dirty” report generator and they did it a bit too quick and a lot too dirty…

  11. Peter Dunford says:

    “Seems like it adds an extra 12000 to the array sizes for pe, idata, id1 and pl – and unless we already have temperature records for the next millenium there’s no data that could hit those.”

    I’m not a programmer, but when you dimension an array don’t you allocate a block of memory for the array. Perhaps this and all the spurious stuff was to justify a new super-computer at some point? “The old one just can’t handle the volume of data sir…. keeps running out of memory”

    Sorry if that’s silly. Great Blog, keep up the good work. Wish I could help.

  12. E.M.Smith says:

    @Peter

    Your theory is sound, but the sizes are off.

    I’m running this, as is with the bogus large array size, on an old PC box with the Linux Red Hat 7.2 Operating System, 400 Mhz AMD chip and all of 108 MB or so of modestly slow memory.

    The code “feels” like it was developed on a Sun SPARC box or similar. They would be faster with larger memory. The size is not going to slow the box down at all.

    I’ve hung out with and managed programmers for most of my adult life (and I R 1 8-) so I have a sense of how they react to the world.

    It looks to me like someone was required to do a “Y2K” code review on this and, in a puckish moment of trying to decide between 2010, 2020, 2040 etc. just decided to say “I’ll make sure they stfu. I’m making this P.O.S. code Y3K compliant. Snicker, chortle..”

    (BTW, the P in P.O.S. stands for “Piece”. The O for “of”…)

    They would know they were wasting memory size, but they would also know it just wasn’t enough to make any difference at all on any reasonable box (and on Linux / Unix boxes the excess idle space would just end up “swapped” out to disk anyway, so we’re really talking one I/O …) and it would make a nice joke to tell friends over beer or cherry slurpies.

    It’s the kind of thing programmers do to entertain themselves on boring projects.

    Heck, in about 1980 I wrote a chunk of Cobol that had a 4 digit year field in it even though some peers were derisive about my hutzpah that my code would still be running in 20 years and my boss complained about my wasting all those bytes in the database to hold nothing but “19” repetitively.

    Mostly I did it because I just didn’t like the potential for over / under run on years of data (i.e. I was as worried about an old or miss keyed record from 1899 as a new one from 2001 and there was no guarantee who would run this code on what or what bad data might be keyed in by hand by a Data Base Admin… But another reason was just that I could avoid ‘padding’ that field back out with “19” in front on print out, and the code looked cleaner. In deference to some of the questions about requiring key entry folks to repeatedly type “19” I did allow the input code to presume a leading 19 but documented that it ought not to do that forever (and included my original lines as commented out text so it could be restored in 20 years…)

    As far as i know, the company went out of business about 10 to 15 years later…

    So when it comes to programmers, it’s more likely to be an “in the weeds detail” or “inside joke” than a grand strategic plan.

  13. Tonyb says:

    Whilst you may be ‘grunt and management’ you do have cheer leaders as well-and I note we havent been paid yet this month. :)

    Have you been over to the WUWT thread ‘Research claims-dropping co2’

    TOMP has emailed Essex to try to prove his paper wrong about the concept of a global temperature. TOM is convinced a global temperature is a viable measurement and that GISS is an accurate dataset. If you have time why don’t you drop this article into the discussion?

    best wishes

    tonyb

    REPLY: Sounds good, but what is a TOMP? or a TOM? Not seen the dropping CO2 thread. I’ll look for it.

Comments are closed.