GHCN v1 vs v3 some code

These are the computer programs I use to create the v1 vs v3 comparisons.

If you are not a “computer person”, it will likely have no interest for you at all.

If you are a “computer person”, some of you will be Old Farts who learned FORTRAN decades ago and this will find a soft spot for you. Others will be Young Turks who think that only things written in Object Oriented Languages with all sorts of levels of libraries and objects calling objects can possibly be of interest. Everything procedural and especially those with things like fixed format statements were discarded decades ago. You will be wrong, but that will not trouble you in the slightest… ( There are still huge blocks of code written in COBOL, FORTRAN, Pascal, Ada, PL/1, and a few dozen other languages of that sort. Lots of data is still floating around in fixed format files, and while I can write C to handle such files, it’s a pain.)

Why FORTRAN? Because GIStemp was written in it and I was porting GIStemp when I first started working with the GHCN files. Now I’ve got a closet full of programs to do things with the data and it’s just easier to use the existing code base an hack it to a slightly new form. Yes, I write C. (And a few different database products and at one time or another I’ve either written at least one functioning program or worked with someone putting database calls into a program, in each of the languages listed above. I was a DBA Consultant helping the programming staff get into the databases using their various languages of choice. Oh, and ALGOL too. I really liked ALGOL…) At any rate, I “got over it” about “My favorite language is better than your favorite language” about 30 years ago. They all work. They all do some things better than others. While I’ve not personally written in C++ or similar object oriented languages, I did manage a development team using it (that produced 3 software patents along) and I’ve learned enough to know how it works. Still not impressed all that much. “Yeah, it works too. So?” is about my opinion. You end up with giant blocks of executable binary code, measured in Megabytes for even simple things. For this code, the largest binary is 678 K Bytes. Oh, and it’s not going to suddenly break when some dynamic library changes… once it’s compiled, it’s done. I find that comforting… Yet if a project were already written in the newest Language Du Jour, I’d just us it, whatever it was.

At any rate, with the vain hope that will hold off some of the “Language Bigotry Wars, part 237″… and knowing it can’t… I’m going to put some of the programs her for folks to see.

Yes, much of this would be easier with the data in a database. Doing all the work to pick one, install it, etc. is not high on my “to do” list right now. I had to use FORTAN to make GIStemp go, and this is now working. So why take on a load of infrastructure building that doesn’t have payback? It’s on the Someday Maybe list, but don’t hold your breath…

A couple of things, first, though.

Stealing Characters

WordPress tries to turn things into HTML. That means things like an open bracket < tends to be "stolen" and sometimes all the stuff after it. I may not catch them all on first try.

Rehashed Rewritten Reused code

This is rehashed code. It started live in 2009 processing Temperatures, not Anomalies. It’s been through the conversion and rehash wringer a couple of times, and had a dozen variations made for various special cases. (Yes, Object Language Bigots, I know you can reuse modules from your object library only writing changed layers to wrap around them… then again, who has memorized your whole catalog of objects and all their parameters? IF I really cared, and I don’t, I could make chunks of this into subroutines and call them, too. But since most of this is way less than 100 lines of active code, why bother?) So when reading the code, I may not have changed all the references to calculating a GAT Global Average Temperature to an Anomaly. In many cases variables will have names like itmp instead of itmpanomaly or ianom. The risk of breaking things from changing variable names just wasn’t worth it. That I could just change the input file to anomalies and reuse the code was more important. Just don’t be surprised if some bits are not as clean as my usual stuff.

At least I’m reusing modules and not writing everything from scratch.

Nested Structure

Long ago I got comfortable with doing some processes with UNIX script tools and some in the target languages. This also gives nice “break points” where you can write a re-usable chunk that has a few different back ends to do different things. This means that some of the ‘flow of control’ will seem odd. ( More likely “What the Hell is he doing THAT for?! It could be done in line right here more easily…”) There are two major reasons. Some of it will be legacy “cruft” that just builds up in any code base that’s been in use for a couple of years. More of it builds in flexible points where I can just change one part and make a whole new process. So, for example, I have done various “by latitude” or “by altitude” bits of code. Most of the program chunks that the same, and I just stick in a ‘if altitude is GT FOO’ line, for example.

This does mean that some things are in Shell Scripts. That means that folks can be tempted to argue the merits of C Shell, vs Borne Shell, vs BASH, vs Korn Shell, vs,… Frankly, I’m not even sure in what shell the scripts are written. I think it’s BASH as this is a Linux box; but long ago I learned to write more “generic” code that doesn’t use obscure special features and tends to run everywhere. (Except C Shell that’s “different”… so I usually avoid it unless doing maintenance on preexisting code). Yes, there will be a dozen better ways to do some of the things I do; but few of them will be simpler or more likely to just work anywhere. Even on fresh ports of “whatever” with 1/2 the features missing… It’s how I write. If it bothers you, get over it.

Wordy Comments and Pedantic Style

Having gone back to some arcane and tricky bit of code and cursed the Evil Inconsiderate Bastard who wrote such complicated and obscure stuff and left comments like “Now we do the work”… and discovered on reading the doc.block it was me… I now leave to myself copious notes in the code. I also avoid “neat tricks” that may not be remembered 4 years and 2 languages later… If that bothers you, then copy the code and edit out all the comments. They start with “C” in FORTRAN or “#” in the shell scripts and most of the folks will likely find them for more interesting that the actual code.

The Code

I concatenate the Inventory File with the Temperature Data File. This is horribly wasteful of disk space, but makes it easier for me to write some kinds of programs later. So, for example, if I want all things in certain latitude or longitude bands, I can just pick those fields out of the current record. No need to match two files on key fields. No need to code database calls. As the disk in the computer was free about a decade ago and still isn’t full, it’s not like the cost matters. It was fast, and it is convenient.

So the first bit of code just takes the INV file and matches it to the v3.mean temperature data. I named it mkinvmean for “Make Inventory Mean combined file”. In v3 the format of the temperature data changed from v2, so I used these intermediate steps to convert some of it closer to the v2 form so reuse of prior code base would be easier. This is in Shell Script form and has a lot of things where you can pass in parameters to override defaults. So, for example, the first parameter is ‘where the inv file lives’ and the second is where the v3.mean file lives. The defaults are what I use most of the time, but I can override with a simple passed parameter to try something else. For those not familiar with Unix / Linux shell scripts, they are wonderful tools that are the glue holding all the other bits together. Threaded interpreted languages with an elegance in their own right. Once written, one script can call another just by speaking its name… You will see that in the code. In the land of *Nix the symbol “..” means “one directory up the naming tree”. An executable bit of code is often stored in ./bin or ../bin (the present directory with a subdirectory of bin or ‘go up one level’ and find a directory named ‘bin’.


echo " "
echo "Optional:  sort the v3.inv type file by Country Code / Station ID"
echo " "

DIR="../data" 
DTDIR="./data"
ls -l ${1-$DIR/v3.inv} $DTDIR/v3.inv.ccid

echo " "
echo -n "Do the sort of $DIR/v3.inv data into $DTDIR/v3.inv.ccid (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     INVFILE=$DTDIR/v3.inv.ccid
     echo INVFILE= $INVFILE

     sort -n -k1.1,1.12 ${1-$DIR/v3.inv} > $INVFILE
else
     INVFILE=${1-$DIR/v3.inv} 
     echo INVFILE= $INVFILE
fi

echo " "
echo "v3.inv data from:"
echo " "
ls -l $INVFILE
echo " "

ls -l ${2-$DIR/v3.mean} $DTDIR/v3.sort.ccid

echo " "
echo "Optional:  Sort the v3.mean type file by Country code / station ID"
echo " "
echo -n "Do the sort of v3.mean into v3.sort.ccid (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     MEANFILE=$DTDIR/v3.sort.ccid
     sort -n -k1.1,1.10 ${2-$DIR/v3.mean} > $MEANFILE
else
     MEANFILE=${2-$DIR/v3.mean} 
fi

echo " "
echo "v3.mean data from: "
echo " "
ls -l $MEANFILE
echo " "

echo "Then feed the v3.inv data, sorted by cc/station ID, and "
echo "v3.mean by ccid to the program that matches ID to description"
echo "records in the data set."
echo " "
echo -n "Do the matching of $INVFILE with $MEANFILE into: $DTDIR/v3.mean.inv (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
      ./bin/ccaddlatv3 $INVFILE $MEANFILE
fi

echo " "
echo "Produced the list of v3.mean.inv records "
echo "(temps, with station data, sorted by CCStationID)"
echo " "

ls -l v3.mean.inv  $DTDIR/v3.mean.inv

echo " "
     ls -l $DTDIR/v3.sort.ccid $DTDIR/v3.inv.ccid
echo " "

echo -n "Remove the work files v3.sort.ccid and v3.inv.ccid (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     echo rm $DTDIR/v3.sort.ccid $DTDIR/v3.inv.ccid
     rm $DTDIR/v3.sort.ccid $DTDIR/v3.inv.ccid
fi

echo " "
echo -n "Does v3.mean.inv need sorting by CC, Station ID, and Year (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
#     sort -n -k1.4,1.11  v2.sort.id.withlat
#     meansortidyrv3 sorts on the first 15 chars.  cc,Stn,year

        echo ./bin/meansortidyrv3 v3.mean.inv giving v3.mean.inv.ccsidyr
        ./bin/meansortidyrv3 $DTDIR/v3.mean.inv $DTDIR/v3.mean.inv.ccsidyr
	echo
        ls -l $DTDIR/v3.mean.inv.ccsidyr
fi

There are a lot of meanings coded into the file names. Why? Just for me. To mark things from one program as different from another. There isn’t a great wisdom to it, nor have I made a dictionary. But, for example, v3.mean.inv.ccsidyr is a file containing data from the v3 version of GHCN, with both mean temperature based data (as opposed to MIN or MAX based) and with Country Code, Station IS, and Year as key fields. At the end of the script above, in the last block, you can see where I commented out an old “Sort” call using a “#” and substituted a small script in .bin/meansortidyrv3 that does a sort of a “mean” based file format on Station-ID and Year.

That whole “script” is really just a one line sort command. It could be put in the main script there, just like the one that was commented out. It is:

sort -rn -k1.1,1.15 ${1-"./data/v3.mean.inv"} > ${2-"./data/v3.mean.inv.rev"}

You can see that it has different defaults for the file names and sorts on the v3 sized fields. It gets used in several other bits of scripting, so make a ‘one line script’ out of it to avoid retyping it.

What does this script do? Mostly it hand holds you through the process of matching the two files on a common key and adding the Inventory inormation to each record. Yes, you could do it “long hand” but this also lets you manage intermediate files and lets you restart at any point. I usually make one of these whenever I’m going a manual process, just so I can avoid a lot of typing as I debug what to do. Then, when it’s proven right, it’s just very convenient to let it do all the typing and I just type “mkinvmean” and say Y or N to various questions.

The actual work of matching the two files on a key and printing out the combined record is done by the FORTRAN program called by name in the middle / bottom: ccaddlatv3

It is customized for v3 (thus the v3 on the end) from the original that ran on v2 data. Remember that C starts a comment line that doesn’t actually DO anything. You will notice some “diagnostic writes” commented out in the code. Some folks pull them, I leave them in for any future debugging need during some future port or re-purposing. I’ve pretty much been 100% successful at just looking at intermediate stages of the data to do debugging, so no, I don’t use a lot of other debugging tools. Never needed to.

C     Program: 	ccaddlat.f
C     Written: 	May 23, 2012
C     Author:  	E. M. Smith
C     Function:	This program matches v3.inv and v3.mean on Station IDs
C     sorted in order by cc stationID(8) and produces a merged v3.mean
C     format file.
C     So as to match station location info with  thermometer records.
C
C     Copyright (c) 2012
C     This program is free software; you can redistribute it and/or modify
C     it under the terms of the GNU General Public License as published by
C     the Free Software Foundation; either version 2, or (at your option)
C     any later version.
C
C     This program is distributed in the hope that it will be useful,
C     but WITHOUT ANY WARRANTY; without even the implied warranty of
C     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
C     GNU General Public License for more details.

C     General Housekeeping.  Declare and initialize variables.
C
C     itmp    - an array of 12 monthly average temperatures for any
C               given record.  The GHCN average temp for that station/year.
C     icc     - the "country code"
C     iyr     - The year for a given set of monthly data (temp averages).
C     ayr     - The year as char for a given set of monthly data.
C     idcount - the count of valid data items falling in a given month
C               for all years.  An array of months-counts with valid data.
C     sid     - Station ID for the idcount.
C     line    - A text buffer to hold the file name of the input file
C               to be processed, passed in as an aguement at run time.
C     line2   - A text buffer for the file name of Station IDs to process.
C     oline   - A text buffer for the output file name, set to the input
C               file name.GAT
C     cid     - Country Code - Station ID 3+8 char
C     csid    - Country Code - Station ID 3+8 char
C     id      - Staion ID in the input data.
C     buff    - A text buffer for holding input / output record data.
C     buff2   - A text buffer for holding input / output record data.

C2345*7891         2         3         4         5         6         712sssssss8

      integer itmp(12), icc, iyr, idcount, lc
      character*128 line, line2, oline
      character*8 id, sid 
      character*11 cid, csid
      character*4  mod, ayr
      character*96 buff
      character*96 buff2

      data itmp    /0,0,0,0,0,0,0,0,0,0,0,0/
      data buff /"                                                  "/
      mod=""
      ayr="    "
      icc=0 
      id=""
      sid=""
      cid=""
      csid=""
      idcount=1
      iyr=0 
      lc=0

C     Get the name of the input file, in GHCN format.  The file must be
C     sorted by ID (since we count them in order.)
C     The name of the output file will be inputfile.withmean
C     line2 will hold the v2.mean type file sorted on ID (9 char)
C     line will hold the v2.inv info sorted on ID (8 char)

      call getarg(1,line)
      call getarg(2,line2)
      oline="./data/v3.mean.inv"
  
C      oline=trim(line)//".withmean"

C     lines of the form "write(*,*)" are for diagnostic purposes, if desired. 
C      write(*,*) oline

      open(1,file=line,form='formatted')
      open(2,file=line2,form='formatted')
      open(10,file=oline,form='formatted')              ! output 

C      write(*,*) line2

C     read in a v3.mean record, then a v3.inv description record
C     original addlat.f has a bug, icc is read in 2 x to same variable.

      read(2,'(a11,a4,a4,a96)',end=500) csid, ayr, mod, buff2
C      write(*,*) "Before 20: ", icc, sid, ayr, mod, buff2
   20 read(1,'(a11,a96)',end=400) cid,buff
C      write(*,*) "After 20: ", icc," ", id, " ", buff
C      write(*,*) "at 20: ", sid," ",ayr," ",mod," ",buff2 

   40 continue
      if(cid.eq.csid) goto 70
      if(cid.gt.csid) goto 80
      if(cid.lt.csid) goto 90

C      write(*,*) "Compiler error!.  You can't get here!"
C2345*7891         2         3         4         5         6         712sssssss8

   70 DO WHILE (cid .eq. csid) 
         write(10,'(a11,a4,a4,a96,1x,a96)') csid,ayr,mod,buff2,buff
C         read(2,'(i6,1x,a8,a1)',end=300) idcount, sid, mod
         read(2,'(a11,a4,a4,a96)',end=300) csid,ayr, mod, buff2
C      write(*,*) "From 70: ", icc," ", sid," ", mod," ", buff2
C         read(1,'(a11,a96)',end=200) cid,buff
      END DO
      goto 40

   80 DO WHILE (cid .gt. csid) 
C         read(2,'(i6,1x,a8,a1)',end=300) idcount, sid, mod
C         read(2,'(i3,a8,a1,a64)',end=300) icc, sid, mod, buff2
         read(2,'(a11,a4,a4,a96)',end=300) csid, ayr, mod, buff2
C      write(*,*) "From 80: ",  csid," ", mod," ", buff2
         lc=lc+1
         if (lc.gt.1) then
                write(*,*) "in 80: ", csid," ",mod," ", buff2
                write(*,*) "in 80: ", cid," ", buff
		write(*,*) "Too many times in loop 80", lc
         end if
      END DO
         lc=0
      goto 40

   90 DO WHILE (cid .lt. csid) 
         read(1,'(a11,a96)',end=200) cid,buff
C      write(*,*) "From 90: ", icc, id, buff
C      write(*,*) "id vs sid", id, sid
      END DO
      goto 40

  200 continue
      STOP "Out of v3.inv Station Records - ought to be rare!"

  300 continue
      STOP "Out of v3.mean records.  Most likely case."

  400 STOP "Input file 1 blank on first record!"
  500 STOP "Input file 2 blank on first record!"
      END

Pretty short and fairly straight forward. Run your “digital finger” down each file, matching lines, write out the composite, move on to the next one.

Making Anomalies

This next set just turns temperatures into anomalies. It does this with a simple variation on First Differences ( a peer reviewed method, IIRC). It normally starts with the first data item, and sets that ‘difference’ to zero (as it is what it is) then looks at the next data item in the time series and subtracts one from the other, giving the difference. So if it was 20 degrees yesterday, and 21 today, the difference is 1 degree. So you get “0”, then “1” as the entries.

I make two changes.

1) I “start time” with the present record and go backwards in time. This is still a “First Differences”, but it uses the most recent temperature as the starting point. (Though I’ve sometimes called it “Last Differences” for a shorthand term). In this way we avoid the large swings that can be caused if the first data point is a bit whacky. Very old thermometers can sometimes be significantly “off” from the present ones. So ‘error’ tends to show up in the far past where folks don’t really are. It also means more thermometer records “start time” at about the same place. Convenient both for giving more consistent results, but also for highlighting when a large group of thermometers suddenly enter (going backward in time, exiting going forward in time) the records. Finally, but starting “now” and going back in time we can clearly see if the current ‘cycle wiggle’ is biasing the results. If most of the record is 1/2 C lower than the last couple of years, your “warming” is from where the record ends in a cycle, not some long term change of temperatures in a gradual way. We’ve seen that in several of the graphs.

2) Missing data handling. For classical First Differences, if there are missing data, you just re-zero and reset. As there are a lot of missing data in some places, that gives some very strange results. I just assume that if you have January for some years in a row, and leave out a couple, then get some more, that somewhere in between you passed through the space between the two. So if you had a series that was +1/10, +1/10, -1/10, missing, missing, +2/10; I just hold onto the last anomaly and wait while skipping missing data. When I hit the +2, I just account for all the change in that year. So you would have 1/10, 2/10, 1/10, 3/10 as the series. This, IMHO, more accurately reflects the reality of the temperatures recorded. That last year WAS 2/10 higher than the prior value, so throwing it away and using a zero is just wrong. In this way this code is much more robust to data dropouts and also more accurate.

We again start off with a ‘wrapper script” to do the managing of the process. I take the default values for file names and such and stuff them into symbolic names so that my typing is reduced, the code is more readable, and they can be changed in one place without risk of typos further down in the code at a later time. We also see the reuse of that one line sorting script from above. Then a call to the FORTRAN program that does the work, passing it a correctly sorted file.

#
# Take a composite mean and inventory file, and reverse sort it.
# Then feed it to the anomaly file creating program dTdtv1
#
#


DTDIR="./data"
MEANINV=${1-$DTDIR/v3.mean.inv} 
REVD=$DTDIR/v3.mean.invR

DIR="../data"

echo " "
echo -n "Do the sort of $MEANINV data into $REVD (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     INVFILE=$MEANINV
     echo INVFILE= $INVFILE
     echo
     echo ./bin/meansortmeaninv3 $MEANINV giving $REVD
     echo

     ./bin/meansortidyrv3 $MEANINV $REVD

     echo

     ls -l $REVD
fi

./bin/dTdtv3 $REVD

ls -l $REVD.dt

And what does dTdtv3 look like? It has a bit fewer comments, but is very short. I’ve also got some ‘commented out’ diagnostic writes that print descriptive things when active, but act as document / comments when commented out. A useful “trick”.

C2345*fff1         2         3         4         5         6         712sssssss8
C
C    this program converts temperatures into delta Temp / delta time
C    Input must already be sorted by Station ID and by year 
C    A special v3.mean+v3.inv concatenated file is the source (so that
C    station meta-data are attached to temperatures
C

      integer temps(12), otemps(12), dtemps(12)
      integer yr,  m, bad

      character*11 stnID, ostnID 
      character*128 meta, oline, line

      data temps   /0,0,0,0,0,0,0,0,0,0,0,0/
      data otemps  /0,0,0,0,0,0,0,0,0,0,0,0/
      data dtemps  /0,0,0,0,0,0,0,0,0,0,0,0/

      yr     = 0 
      m      = 0
      bad    = -9999

      stnID  = " "
      ostnID = " "
      meta   = " "
      oline  = " "
      line   = " " 


C     Get the name of the input file, in modified GHCN format (w/ metadata).
C     The file must be sorted by year (since we do a delta year to year.)
C     The name of the output file will be that of the inputfile.dt
C     The input file ought to be a GHCN format file with V2.inv data
C     concatenated at the end of each temperature line, as will the 
C     output file have meta data after each dT/dt line.

      call getarg(1,line)
      oline=trim(line)//".dt"
      open(1,file=line,form='formatted')
      open(10,file=oline,form='formatted')              ! output


C2345*fff1         2         3         4         5         6         712sssssss8

C     Read in a line of data 
C     (Country Code/StnID, year, temperatures, meta)

   20 read(1,'(a11,i4,4x,12(i5,3x),1x,a96)',end=999) stnID,yr,temps,meta
C     write (*,*) "Input Data"
C     write (*,*) stnID, yr, temps, meta

C      if you have a new record for the same station, find dT/dt.

C2345*fff1         2         3         4         5         6         712sssssss8

      if(stnID .eq. ostnID) then
        do m=1,12
           if ( (otemps(m).ne.bad) .and. (temps(m).ne.bad) ) then
C              write (*,*) "non-bad temp and otemp"
               dtemps(m) = temps(m) - otemps(m)
               otemps(m) = temps(m)
           else
C              write (*,*) "Somebody Bad "
               dtemps(m)=0
               if ( temps(m) .ne. bad ) then
C                   write (*,*) "temps non-bad +++++"
                    otemps(m) = temps(m)
               end if
           end if

         end do

      else
           ostnID=stnID
           do m=1,12
              dtemps(m)=0
              otemps(m)=temps(m)
           end do
      end if 
C     write (*,*) "ORDINARY Output"
C     write (*,*) stnID, yr, dtemps, meta
      write (10,'(a11,i4,12i5,a96)') stnID, yr, dtemps, meta

      go to 20

  999 continue
C     write (*,*) "Final RECORD"
C     write (*,*) stnID, yr, dtemps, meta

      close (1)
      close (10)

C2345*fff1         2         3         4         5         6         712sssssss8
 9999 continue

      stop
      end

Again, too, notice the comment – “rulers” that give column positions. On the old punched cards, spaces in the front were reserved for line numbers. At position 72, things just ‘cut off’ (and if you don’t use a proper ‘continuation card’ can cause odd bugs when the variable “alltojohn” becomes “alltojo” as the last to characters get chopped…) So I have those ‘rulers’ stuck in every so often to make it easy to spot lines that need a continuation…

This program is compiled with it’s own little 2 line script. It is named “mkdTf”

g95 src/dTdtv3.f 
mv a.out bin/dTdtv3 

In this way I don’t have to worry about which compiler I used (f77? g95?). I can also just comment out the last line when I want to experiment with new versions without overwriting the old one that works.

There’s a similar program for aligning the v3 data with the v1 data (that ends in 1990). It differs by one line. Just after that line 20 that reads in the data, it looks at the year and if it is newer than 1990, just goes to read the next record. Here is the output of the ‘diff’ program that compares the two:

54a55
>       if(yr .gt. 1990) goto 20

Once the file is built, it’s easy to make many different kinds of reports from it. Here’s one of them. The dPdt0 program that looks at data which ends in 1990 and makes a report. First up, the script, that makes sure the right files are being used. Again, it’s an interactive hand holder that cuts down on typing.

It also does nice little coordination things like letting me choose to keep the intermediate files around (so future reports are faster) or remove them; and lets me see the report (in ‘vi’ the Visual Editor) and asks me if I want to put them on my ftp server so I can make graphs out of them on the laptop or not.

#       This version expects a file with the 'mod' flags combined.
#       such as ought to be in ./data/v3.mean.inv.dt

#	First off, extract the target country or countries data from a v3.mean type 
#       file, then sort it into a version for reporting by year.

DIR=${3-./DTemps}
DAT=${4-./data}
INFILE=${2-$DAT/v3.mean.invR0.dt}
STATN=ALL

TEMPS=Temps.rM0$STATN
REPORT=$DIR/$TEMPS.yrs.dP
#STATN=${1-501}

OUTFILE=$DIR/v3.meanR0.$STATN

echo " " 
echo "Do the source and  Work Files already exist? : "
echo " "
ls -l $INFILE  $OUTFILE $DIR/$TEMPS

echo " "
echo "Do Older Reports in this series exist? : "
echo " "
ls -l $REPORT
echo " "
echo -n "Extract / process $STATN from $INFILE (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     PAT=^[0-8]
     echo $PAT
     grep $PAT $INFILE > $OUTFILE

     ls -l $OUTFILE
fi

echo " "
echo -n "Do the SORT for $OUTFILE (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     echo Now Sort
     echo

     sort -rn -k1.12,1.15 -k1.1,1.11  $OUTFILE > $DIR/$TEMPS

     echo 
     echo After the Sort
     echo 
fi

ls -l $DIR/$TEMPS

echo " "
echo "Doing dP/dt Yearlies: dpyears"
echo " "

echo " "
echo -n "Do the Reporting from $DIR/$TEMPS (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
#     ./bin/lrmdyearsv3 $DIR/$TEMPS
#     ./bin/dMTdt $DIR/$TEMPS
#     ./a.out $DIR/$TEMPS

     ./bin/dpyears $DIR/$TEMPS

     echo " " >> $REPORT
     echo For Country Code $STATN >> $REPORT
     echo " " >> $REPORT
     echo From input file $INFILE  >> $REPORT
     echo " "
     echo "Produced:"
     echo " "
     ls -l $REPORT
fi

echo " "
echo -n "Look at $REPORT (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
#     cat $REPORT
     vi $REPORT
fi

echo " "
     ls -l $OUTFILE $DIR/$TEMPS
echo " "
echo -n "Clean up / Delete intermediate files (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     rm $OUTFILE $DIR/$TEMPS
fi

#CoStns ${1-501}

echo "Do form country name"

echo " "
echo -n "Copy $REPORT to /var/ftp/pub/ToPc/v3.$TEMPS.yrs.dP (Y/N)? "
read ANS
echo " "

if [ "$ANS" = "Y" -o "$ANS" = "y" ] 
then
     cp $REPORT /var/ftp/pub/ToPc/v3.$TEMPS.yrs.dP
     cp fort.11 /var/ftp/pub/ToPc/v3.$TEMPS.yrs.dP.csv
     ls -l  $REPORT /var/ftp/pub/ToPc/v3.$TEMPS.yrs.dP
     ls -l  /var/ftp/pub/ToPc/v3.$TEMPS.yrs.dP.csv
fi

Notice that I produce a cut down version of the summary columns in “Comma Separated Value” or csv form. Makes it a little quicker to do the graphs. It goes to the default file name “fort.11” and I rename it something more useful.

You can also see where this same “wrapper” with minor changes has been used for several other programs that are commented out with a “#” sign. This one calls ‘dpyears’ to do the actual report. In some ways this is the ugliest of the batch. It started life as a program that averages temperatures, so it still is commented with “temperatures’ in stead of anomalies. That needs polishing…

C2345*7891         2         3         4         5         6         712sssssss8
C     Program:  dpyears.f
C     Written:  May 24, 2012
C     Author:   E. M. Smith
C     Function: To produce a list of Global Average Temperatures for
C     each year of data in a GHCN format file, with one GAT for each
C     month and a total GAT for that year.  Summary GAT records are
C     produced for the whole data set as a "crossfoot" cross check of
C     sorts.  While you might think it silly to make a "global average
C     temperature" for a 130 year (1880 to date) or 308 year (1701 the
C     first data in GHCN, to date) interval, once you accept the idea
C     of adding together 30 days, or 365 days of records, or
C     thermometers from all over the planet "means something":
C     Where does it end?

C     Personally, I think the whole idea of a GAT is bogus,
C     but if you accept it as a concept (and GIStemp and the AGW
C     movement do) then you must ask:
C     "in for a penny, in for a pound":
C     When does the GAT cease to have some value, and exactly why?...
C
C     So I produce GAT in several ranges and you can inspect it
C     and ponder.

C     Copyright (c) 2012
C
C     This program is free software; you can redistribute it and/or
C     modify it under the terms of the GNU General Public License as
C     published by the Free Software Foundation; either version 2,
C     or (at your option) any later version.
C
C     This program is distributed in the hope that it will be useful,
C     but WITHOUT ANY WARRANTY; without even the implied warranty of
C     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
C     GNU General Public License for more details.
C
C     You will notice this "ruler" periodically in the code.
C     FORTRAN is position sensitive, so I use this to help me keep
C     track of where the first 5 "lable numbers" can go, the 6th
C     position "continuation card" character, the code positions up
C     to number 72 on your "punched card" and the "card serial number"
C     positions that let you sort your punched cards back into a proper
C     running program if you dropped the deck.  (And believe it or
C     not, I used that "feature" more than once in "The Days Before
C     Time And The Internet Began"... 

C2345*7891         2         3         4         5         6         712sssssss8

C     Oddly, within the 7-72 positions, FORTRAN is not position
C     sensitive.  This was so if you put in an accidental space
C     character, you didn't need to repunch a whole new card...
C     Oh, and if you type a line past the 72 marker, you can cut
C     a variable name short, creating a new variable, that FORTRAN
C     will use as an implied valid variable.  So having "yearc"
C     run past the end can turn it into "year" "yea" "ye" "y"
C     which will then be the actual variable you are using in that
C     line, not the one that prints out on your card.  The
C     source of endless bugs and mirth 8-}

C     General Housekeeping.  Declare and initialize variables.
C
C     itmp    - an array of 12 monthly average temperatures for any
C               given record.  The GHCN average temp for that
C               station/year.
C     incount - the count of valid data items falling in a given month
C               for a year.  An array of months-counts with valid data.
C     nncount - the count of valid data items falling in a given month
C               for all years.  An array of months-counts w/ valid data.
C     itmptot - Array of the running total of temperatures, by month.
C     ymc     - count of months for the total of a year  with some
C               valid data.
C     ntmptot - running total of all temperatures, by month column.
C     icc id iyr nyr iyrmax m iyc - countrycode, Stn ID, one year
C               of data as monthly averages of MIN/MAX temps, max year
C               so far, month, iyc In Year Counter: # recs in year.
C     mod     - "modification history flag".  Place holder, not used.
C     tmpavg  - Array of average temperatures, by month. The data
C               arrive as an INTEGER with an implied decimal point
C               in itmp.  This is carried through to the point where
C               we divide by 10 and make it a "REAL" or floating point
C               number in this variable.
C     ttmpavg - Total of temperature data by month for all years.
C     tymc    - count of months for the total of all data with some
C               valid data.
C     eqwt    - Total of monthly averages of temperature data, by month.
C     eqwtc   - Counter of months with valid data.
C     eqwtg   - Grand Total of calculated monthly averages of MIN/MAX
C               averages.  Divided by eqwtc for Grand Avg.
C     gavg    - Global Average Temperature.  GAT is calculated by
C               summing tmpavg monthly averages that have valid data,
C               then dividing by the count of them with vaid data.
C    ggavg    - The Grand Grand Average Temperature,
C               whatever it means...
C     line    - A text buffer to hold the file name of the input file
C               to be processed, passed in as an aguement at run time.
C     oline   - A text buffer for the output file name,
C               set to the input_file_name.GAT

C    runt     - An added running total annual anomaly... 

C2345*7891         2         3         4         5         6         712sssssss8

      integer incount(12), nncount(12), itmptot(12), ntmptot(12)
      integer itmp(12) 
      integer icc, id, iyr, nyr, iyrmax, m, iyc, mod

      real tmpavg(12), ttmpavg(12), eqwt(12), eqwtc(12)
      real gavg, ggavg, ymc, tymc, eqwtg, runt

      character*128 line, oline

      data incount /0,0,0,0,0,0,0,0,0,0,0,0/
      data nncount /0,0,0,0,0,0,0,0,0,0,0,0/
      data itmptot /0,0,0,0,0,0,0,0,0,0,0,0/
      data ntmptot /0,0,0,0,0,0,0,0,0,0,0,0/
      data itmp    /0,0,0,0,0,0,0,0,0,0,0,0/

C      data tmpavg  /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
C2345*7891         2         3         4         5         6         712sssssss8

      data tmpavg  /-99.,-99.,-99.,-99.,-99.,-99.,-99.,-99.
     *,-99.,-99.,-99.,-99./

      data ttmpavg /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      data eqwt    /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      data eqwtc   /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./

      icc   =0
      id    =0
      iyr   =0
      nyr   =0
      iyrmax=0
      m     =0
      iyc   =0
C      mike  =0

      gavg  =0.
      ggavg =0.
      eqwtg =0.
      ymc   =0.
      tymc  =0.
      runt  =0.

      line =" "
      oline=" "

C     Get the name of the input file, in GHCN format.  The file
C     must be sorted by reverse year (since we sum all data by month
C     within a year.) The name of the output file will be that of
C     the input_file.yrs.dT where dT stands for delta
C     Temperature.

C2345*7891         2         3         4         5         6         712sssssss8

      call getarg(1,line)
      oline=trim(line)//".yrs.dP"
      open(1,file=line,form='formatted')
      open(10,file=oline,form='formatted')              ! output

C     Read in a line of data (Country Code, ID, year, temperatures)
C     Set the max year so far to this first year, set the "LASTID"
C     to zero so it will fail the equality test later.
C     Note that we read in an 8 digit station ID, and do not
C     count the individual "modification flags" as new stations.

      read(1,'(i3,i8,i4,12i5)',end=200) icc,id,iyr,itmp
      iyrmax = iyr
      LASTID = 0
      rewind 1
      write(10,'("Produced from input file: ",a)') line
      write(10,'(" ")')
      write(10,14)
      write(10,15)
      write(10,16)
      write(10,17)
      write(10,18)

   14 FORMAT("Thermometer Records, Average of Monthly dP/dt, Yearly runn
     *ing total")
C   14 FORMAT("Thermometer Records, Average of Monthly Data and Yearly Av
C     *erage")
   15 FORMAT("by Year Across Month, with a count of thermometer records
     *in that year")
   16 FORMAT("----------------------------------------------------------
     *-------------------------")
C2345*7891         2         3         4         5         6         712sssssss8
   17 FORMAT("YEAR     dP dT/yr  Count JAN  FEB  MAR  APR  MAY  JUN JULY 
     *  AUG SEPT  OCT  NOV  DEC")
   18 FORMAT("----------------------------------------------------------
     *-------------------------")

   20 CONTINUE

      read(1,'(i3,i8,i4,12i5)',end=200) icc,id,iyr,itmp
C      mike=mike+1
C      write(*,*) icc,id,iyr,itmp

      if(iyr .lt. iyrmax) then

C      if you have a new year value, you come into this loop,
C      calculate the Monthly Average delta  Temperatures, the
C      Yearly dT running total for iyrmax.
C      Print it all out, and move on.

        do m=1,12

          if (incount(m) .ne. 0) then

C      We keep a running total of tenths of degree C in itmptot,
C      by month. Then we divide this by the integer count of
C      valid records that went into each month.  This truncates
C      the result (I think this is valid, since we want to know
C      conservatively how much GIStemp warmed the data
C      not how much my math in this diagnostic warms the data ;-)  

C      So we have a "loss" of any precision beyond the "INTEGER"
C      values being divided, but since they are in 1/100C, we are
C      tossing 1/100C of False Precision, and nothing more.
C      THEN we divide by 100. (REAL) and yield a temperature
C      average for that month for that year (REAL).
C      I could do a 'nint' instead:  nint(itmptot(m)/incount(m))
C      and get a rounded result rather than truncated, but I
C      doubt if it's really worth if for a "hand tool" that I'd
C      like to be a conservative one.  If I truncate, then any
C      "warming" of the data is from GIStemp, not this tool.
C      (Or GHCN, now that I'm using to analyse the input data
C      as well as the code itself.)

C2345*7891         2         3         4         5         6         712sssssss8

C       Diagnostic write to check missing data flag handling.
C       write(*,*) "tmpavg: ", tmpavg

            tmpavg(m) = (itmptot(m)/incount(m))/100.

C       Diagnostic write to check missing data flag handling.
C       write(*,*) "TMPavg: ", tmpavg

            gavg      = gavg+tmpavg(m)
            ymc       = ymc+1.

C       We put a running total of yearly averages together,
C       along with a count for tmpavg, it is the total of
C       monthly temperature averages divided by the count of
C       months with data in them (converted to C from 1/100 C).
C       For eqwt it is a running total of those averages that are
C       used at the end to calculate a "by month average of
C       monthly averages ".
C       Basically, the first form, gavg, weights each recored
C       equally, while the second form gives equal weight to
C       each month, regardless of number of records in that month.
C       Which one is right?  You get to choose...  (And THAT
C       is just one of the issues with an "average of averages of
C       averages" means something...

C       I just put them here so you can see that they are, in fact,
C       different...

            eqwt(m)   = eqwt(m)+tmpavg(m)
            eqwtc(m)  = eqwtc(m)+1

          end if
        end do

        gavg=gavg/ymc
        runt=runt+gavg

C2345*7891         2         3         4         5         6         712sssssss8

C Write out the Year, the averages, the grand avg, and the
C number of thermometers in the year

C        write(10,'(i4,12f5.1,f5.1,1x,i4,1x,f6.3)') iyrmax,tmpavg,gavg,
C     *iyc,runt
C2345*7891         2         3         4         5         6         712sssssss8
        write(10,'(i4,1x,f6.2,f6.2,1x,i4,1x,12f5.1)') iyrmax,runt,gavg,
     *iyc ,tmpavg
        write(11,'(i4,",",f6.2,",",f6.2,",",i4)') iyrmax,runt,gavg,iyc

C     Diagnostic "writes", should you wish to use them.
C       write(*,*) "iyc: ", iyc, "Mike: ", mike
C      write(*,'("GAT/year: "i4,12f7.2,f7.2,i6,f7.2)') iyrmax,
C    *tmpavg,gavg,iyc,ymc

C      probably paranoia, but we re-zero the monthly arrays of data.
C      ande pack tmpavg with missing data flags of -99
C
        do m=1,12
          incount(m) =0
          itmptot(m) =0
          tmpavg(m)  =-999.
          ymc        =0.
        end do

        gavg   =0.
        iyc    =0
        LASTID =0
        iyrmax =iyr

C     hang on to the present year value and ...

      end if
C     End of "new year" record handling.

C     So we have a new record (for either a new year or for the
C     same year.) If it is valid data (not a missing data flag)
C     add it to the running totals and increase the valid
C     data count by one.

C2345*7891         2         3         4         5         6         712sssssss8

C     Increment the running total for stations in this year.
C     In Year Counter

      if (id .NE. LASTID) then
          iyc  = iyc+1
          LASTID = id
      end if

C     For each month, skipping missing data flags, increment
C     the valid data counter for that month incount, subtract that
C     year dT as we are running time backwards...
C     temperature data (in 1/100 C as an integer) into the
C     yearly running total itmptot.
C     Also do the same for the total records count nncount
C     and running total of all temperatures (by month) ntmptot.

      do m = 1,12

        if (itmp(m) .gt. -9900) then
          incount(m) = incount(m)+1
          itmptot(m) = itmptot(m)+itmp(m)
          nncount(m) = nncount(m)+1
          ntmptot(m) = ntmptot(m)+itmp(m)
        end if
      end do

C     and go get another record
      goto 20

C     UNTIL we are at the end of the file.
  200 continue

C     Adding last year handling

        do m=1,12

          if (incount(m) .ne. 0) then
            tmpavg(m) = (itmptot(m)/incount(m))/100.
            gavg      = gavg+tmpavg(m)
            ymc       = ymc+1.

            eqwt(m)   = eqwt(m)+tmpavg(m)
            eqwtc(m)  = eqwtc(m)+1

          end if
        end do

        gavg=gavg/ymc
        runt=runt+gavg

C        write(10,'(i4,12f5.1,f5.1,1x,i4)') iyrmax,tmpavg,gavg,iyc
C        write(10,'(i4,12f5.1,f5.1,1x,i4,1x,f6.3)') iyrmax,tmpavg,gavg,
C     *iyc,runt

        write(10,'(i4,1x,f6.2,f6.2,1x,i4,1x,12f5.1)') iyrmax,runt,gavg,
     *iyc ,tmpavg

C2345*7891         2         3         4         5         6         712sssssss8

C     At the end, we make two "grand averages" showing how the order
C     in which an average of avereages of data is done changes the 
C     value.  This effect seems to be studiously ignored in 
C     discussions of GIStemp and other climate data "serial averagers".

C     Here we use the method vetted in the earlier program
C     totghcn.f where we hold the temps as integers in 1/100 C
C     until the very end, then we do a convert to real
C     (via divide by 100.) and cast into a real (ttmpavg)
C     that is the total average temperature for that month for
C     the total data.  

C     ggavg is the grand total GAT, but after it is stuffed with
C     valid data we must divide it by the number of months with
C     valid data.  It is the "Average of yearly averages of
C     monthly averages of daily MIN/MAX averages".
C     Why?  Heck, GIStemp is a "serial averager", thought it might
C     be fun to see what you get.

C     We also show the average of all the individual monthly
C     data.  That gives a different value.
C     Will the real GAT please stand up? ... 

C     I would chose to use the average of all data in a month
C     since it is less sensitive to the variation of number
C     of thermometers in any given year, but you might chose a
C     different GAT.  Averaging the data directly gives weight
C     to the years with more data.  Averaging the monthly
C     averages gives each month equal weight but puts too 
C     much weight on individual thermometers in months with
C     few valid thermometers (like early years).  Choose one...

C     Rational? No.
C     But it is the reality on the ground..
C     Basically, if you do "serial averaging", the order of the
C     averaging will change your results.  As near as I can tell,
C     GIStemp (and the whole AGW movment) pay no attention to this
C     Inconvenient Fact.  They just assume that the "anomaly 
C     process" will save them from it somehow.


      do m=1,12
          if (nncount(m).ne.0) then
            ttmpavg(m)=(ntmptot(m)/nncount(m))/100.
            ggavg=ggavg+ttmpavg(m)
            tymc=tymc+1
          end if
          eqwt(m)=eqwt(m)/eqwtc(m)
          eqwtg=eqwtg+eqwt(m)
      end do

C     ggavg is the grand average of monthly averages for a month.
C     eqwt is the sum of all months data.  eqwtc is the
C     count of all months with valid data.  So this is the
C     place where the total gets divided by the count to give the
C     average of all averages in a month.  

      ggavg=ggavg/tymc
      eqwtg=eqwtg/tymc

C     "At" is the "Total delta data"
C     "Aa" is the "Average of monthly dT"

C      write(10,'("At",2x,12f5.2,f5.2)') ttmpavg,ggavg
C      write(10,'("At",2x,12f5.2,f5.2)') ntmptot,ggavg
C      write(10,'("Aa",2x,12f5.2,f5.2)') eqwt,eqwtg

      stop
      end

In Conclusion

Well, that’s the basic code. Look it over. Let me know if there are any errors or omissions you find. If folks want any more of the variations just let me know (like the one that only goes to 2009 to compare with v2).

It does still need a bit of clean-up, but it is what it is.

As I’ve been up all night making sure a rogue cat leaves the bunnies alone, I’m very much needing a nap right now. For that reason, My QA on these listings will have to waite a few hours… for when I can see straight again ;-)

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in NCDC - GHCN Issues and tagged , , , , . Bookmark the permalink.

26 Responses to GHCN v1 vs v3 some code

  1. EM – I see the main problem with the dataset as being that it isn’t. Since the thermometers are not in the same place over the years, to get a valid average you’d have to delete all those that are not continuous measurements in the same place. It looks like that would also remove virtually all of the data. In order to get some sense from the discarded data, it should be possible to split them into overlapping tranches of years. Within each tranche, use only data that is from thermometers that exist at the start and end of that tranche. If there are missing data in the middle, then either interpolate or delete the thermometer (preferably delete if it’s more than a few points).

    Plotting the raw trends from each tranche over the full year-range would give you a set of lines that most likely do not join in the years that each tranche overlaps, but logically speaking the GAT at that time must have been the same for each tranche that overlaps that date. You could then calculate, going backwards, how much each tranche is displaced from the true figure, and thus get overall a reasonably valid trend line, though the further back you go the less accurate it will be since we’ll be adding both the correction and a possible error-bar at each tranche start/end.

    This method would at least not have the disadvantages of the other ways of calculating the GAT. It might even provide a relatively valid result. If it showed the same overall trend as the thermometers in one place at the beginning and end then you might have a bit more trust in the graph.

    My own programming was a bit of BASIC at the start, but quickly progressing to assembler. I ended up with little routines that I could cut and paste, and it was extremely quick. About 10 years ago I had reason to re-use a simple file-modification program that used to take a few minutes. I didn’t expect it to work in Windows but tried it and it came straight back to the prompt so I thought it had crashed and I’d have to re-do it in a Windows-compatible language. Luckily I checked the output file first, and it had worked, doing the job in a split-second. It’s worth remembering just how quick today’s boxes really are. Fortran, despite its age, is very efficient and your file-processing is probably at least ten times the speed than if you had done it in an OOL.

  2. Ian W says:

    In view of your comments on programming languages you may be interested in the following from many years ago….

    PROGRAMMING LANGUAGE GUIDE

    The proliferation of modern programming languages which seem to have stolen countless features from each other sometimes makes it difficult to remember which language you are using. This guide is offered as a public service to help programmers in such dilemmas.

    C
    You shoot yourself in the foot.

    Assembler
    You crash the OS and overwrite the root disk. The system administrator arrives and shoots you in the foot. After a moment of contemplation, the administrator shoots himself in the foot and then hops around the room rabidly shooting at everyone in sight.

    APL
    You hear a gunshot, and there is a hole in your foot, but you don’t remember enough linear algebra to understand what the hell happened.

    C++
    You accidentally create a dozen instances of yourself and shoot them all in the foot. Providing emergency medical care is impossible since you cannot tell which are bit-wise copies and which are just pointing at others and saying, “That’s me over there.”

    Ada
    If you are dumb enough to actually use this language, the United States Department of Defense will kidnap you, stand you up in front of a firing squad, and tell the soldiers, “Shoot at his feet.”

    MODULA-2
    After realising that you cannot actually accomplish anything in the language, you shoot yourself in the head.

    Pascal
    Same as MODULA-2, except the bullets are the wrong type and won’t pass through the barrel. The gun explodes.

    sh,csh, etc.,
    You cannot remember the syntax for anything, so you spend five hours reading manuals before giving up. You then shoot the computer and switch to C.

    Smalltalk
    You spend so much time playing with the graphics and windowing system that your boss shoots you in the foot, takes away your workstation, and makes you develop in COBOL on a character terminal.

    FORTRAN
    You shoot yourself in each toe, iteratively, until you run out of toes, then you read in the next foot and repeat. If you run out of bullets, you continue anyway because you have no exception processing ability.

    ALGOL
    You shoot yourself in the foot with a musket. The musket is aesthetically fascinating, and the wound baffles the adolescent medic in the emergency room.

    COBOL
    USEing a COLT45 HANDGUN, AIM gun at LEG.FOOT, THEN place ARM.HAND.FINGER on HANDGUN.TRIGGER, and SQUEEZE. THEN return HANDGUN to HOLSTER. Check whether shoelace needs to be retied.

    BASIC
    Shoot self in foot with water pistol. On big systems, continue until entire lower body is water logged.

    PL/1
    You consume all available system resources, including all the offline bullets. The Data Processing & Payroll Department doubles its size, triples its budget, acquires four new mainframes, and drops the original one on your foot.

    SNOBOL
    You grab your foot with your hand, then rewrite your hand to be a bullet. The act of shooting the original foot then changes your hand/bullet into yet another foot (a left foot).

    LISP
    You shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage……

    SCHEME
    You shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage that holds the gun with which you shoot yourself in the appendage…… ….but none of the other appendages are aware of this happening.

    English
    You put your foot in your mouth, then bite it off.

    If you recognize all of those we are of an age ;-)

  3. E.M.Smith says:

    @Ian W:

    I recognize all of them, but managed to avoid writing anything in Lisp, SNOBOL, SCHEME, C++, Modula-2, and Smalltalk, though I’ve learned about each of them and read some of the code. Managed software development of products in C++.

    I’ve written in assembler, C, csh, sh, bash, etc., FORTRAN, BASIC, ALGOL, Pascal, and a little of PL/1. (along with several other languages and a couple of DBMS products).

    I passed a COBOL class and swore Never Again! (but did help some COBOL folks access the corporate databases using the DBMS I managed…so I sort of ‘worked with it’ in an arms length kind of way…) Ada was being used in an adjoining department once. I learned enough to write a program in it, then found a way to be needed elsewhere ;-) Once I decided I ought to know about APL. After an intense week with The Book, I wrote a program that did some really neat data transformations. It was about 6 to 8 characters long IIRC, and did a couple of pages worth of stuff. The next day I couldn’t read it any more or tell how it worked… so moved on…

    It has been asserted that I can write English. The truth is more likely a liberal interpretation of what is English… ‘Merican an Spanglish munged together most of the time… Should I have lunch, or ought I have lunch? I want a coke, or I want for a coke, or I’m gonna wanna coke? How do you say rodeo in English? How about salsa? Chaps? Lariat? What’s a flan in English? It’s easier to say arroz con pollo y frejoles and Margarita especial than to figure out how to say it in English. Mas de salsa y tortillas, por favor … But maybe it’s just that English is becoming the PL/1 of natural languages… N’est Pas?

    Yes, I am “of an age”… just not figured out which one yet ;-)

    @Simon:

    That’s the basic problem. IMHO the folks doing the original work figured out how crap the data were and that every way they tried to “fix it” ended up putting various biases in. IMHO, too, that was when they realized they could “adjust” the data to say anything via only slightly more artful arrangement of which thermometers were in and out and when…

    That’s basically what I ran into. I tried a couple of things and realized that I could move the data around accidentally fairly easily. It took a bit of work to come up with a way that has the minimal influence from data drop outs, splice artifacts, and had good clarity. Even at that, it does suffer from splice artifacts to some degree. Each individual thermometer record is ONLY compared to itself. That means each one is clean. However, if the original records were selected to start on a down date and end on an up date, the composite of several over time will have a gradual “lift” to the series. Each segment has an artificial ’tilt’ from the partial cycle span.

    I’d thought of just finding the trend line, then infilling data back to the midline at each end, then recomputing the trend; but that seemed a bit too much like fabrication. If you chop off the ends back to the trend line, then recompute, that’s a little better. “Shaping the ends”… but often there isn’t enough data to do that.

    In the end, all the “corrections” end up shifting the result more than the signal for which you are searching.

    I explored how this ‘splice artifact’ nature show up in this posting:

    https://chiefio.wordpress.com/2010/04/03/mysterious-marble-bar/

    It takes several thermometers from the area that are used to create a regional trend. Each inspected alone isn’t much. Glued together, you get a rising trend. Yet Marble Bar has a significant Record High that has never been reached again… so how can you “consistently warm” and never have a hotter day?…

    IMHO the “Splice Artifacts” are hidden better via the homogenizing and infilling codes, but that doesn’t remove them.

    So you have yet another proposed in-fll and homogenize idea. Fine. How will you prevent end effects putting trends in segments, and how will you prevent spice artifacts, however subtle and well hidden?

    TonyB looked at long lived thermometers and finds little to no warming. (What there is is well inside the UHI probable). I looked at the bulk of data and found that “warming” showed up only in significant degree in short lived and especially recent thermometer records.. Likely due to a change of processing. ( A specific change of “mod flag” or “duplicate number” happens then).

    In the end, IMHO, we are better able to say what the “trend” is by looking at those few individual thermometers that are long lived and well tended. They say it was very warm about 1720, got cold in the Little Ice Age. Warmed in the 1930s to warmer than now by a smidgeon. And now is not significantly different from then, or from the early 1700s.

    That all collectively implies the GHCN is simply “Not fit for purpose” to say anything about climate change as the data set is presently constructed.

    But this code is my best effort and getting most of the garbage out of the post GHCN processing. Unfortunately, with GHCN v3 more of the ‘bad ideas’ are being moved upstream into the formation of GHCN, so beyond reach. IMHO that shows up in the changes from v1 vs v3 where you can spot what countries have ‘added processing’ by how their temperatures shift in the v3 past compared to their prior past…

  4. jim says:

    Visual Studio DotNet: The code designer shoots itself in the “”, and won’t tell you 1) What gun was used 2) what you might have done to induce the designer to shoot itself, and 3) give you no clue at all where the hole is or what the hole broke.

  5. Sandy McClintock says:

    You have put in a lot of work on this! Well done.
    FORTRAN has been my trusty friend for 40 years and the code still works! :)
    I also used to use Visual BASIC but each time Microsoft ‘upgraded’ to a new version, it ‘broke’ my code. What a waste of time – never again.
    Often I use FORTRAN to massage data ready for importation into MS Access. Access is remarkably good for matching files and reasonably good at doing what your dTdtv3 program does. Some people tell me its not efficient enough, but I had tables with 25 million records which could be ‘processed’ in 5 to 10 minutes.

    I have one system in Access that
    sucks in ASCII data, manipulates / matches it,
    writes out ASCII data ready for statistical analysis,
    fires off a complex FORTRAN program that reads this and writes the ASCII result files,
    sucks the crude output back into Access and massages results for XL output.

    Neither Access nor FORTRAN would do all these tasks easily by themselves.

  6. EM – I have now read the Marble Bar post. As a recent arrival it’s going to take some time to read the massive output you’ve put up over the last 3 years. Yup, it looks like the “tranche” method I put forward also is going to have false data-constructs in it, though I still think they would be overall less. Since there is such a range for “nearby” in the GHCN calculations, it’s looking like the only way to get a good picture of the data is to follow only single stations. If the “average” temperature seems to have a step function, then find the reason (as in massive iron-smelting works built next to it?) and either apply the necessary corrections to it or just delete that series. This way of looking at the data would need man-years of work, but it’s something that IPCC should have done before scaring the politicians into reducing everyone’s standard of living. For now, I’ll take the single long-lived records as having validity.

    Interestingly, it also looks like the international temperature scale was redefined in 1990 – could be a cause of some anomalies at around that time since the new definitions would not be applied to older data and the thermometers would not be instantly changed out. Add to that the differences between alcohol and Mercury thermometers, and the waters are even muddier. It’s not the sort of data you’d trust your life to, yet that’s what’s happening.

    One programming language not mentioned is Databus, a proprietary language of Datapoint computers. This was a compiled basic-like language with better data-handling routines and ISAM indexing built-in. The machine structure was a number of 1MHz 8-bit processors tied into a token-ring via 1Mbit/s cable, and each processor could handle up to 32 terminals on serial links. With around 16 terminals per processor, we normally got sub-second response times. It could also run COBOL, but somewhat slower. If that sort of efficient code were run today, responses would be lightning-quick. Looking back I’m amazed at the performance we got out of such underpowered systems.

  7. dearieme says:

    I started with Atlas Autocode and then moved onto IMP – both Algol-like languages. Being obliged later to swap to FORTRAN was awful: I dare say it was like a Roman entering the Dark Ages.

  8. E.M.Smith says:

    @Simon:

    Interesting machine architecture…

    And yes, there are so many bumps in the temperature road I’d not trust anything of importance to it.

    @Dearieme:

    Did I mention I really liked Algol? (And by extension Algol Like languages…)

    @Sandy McClintock:

    Thanks!

    Feel free to implement it in Access. I’ve sporadically thought to doing that, but then the annoyance of buying another M.S. product comes to mind and I recover ;-) Frankly, at this point, putting up MS Access and keeping it current and functional is more than just tweaking the FORTRAN. But were I starting from scratch I’d likely use some public SQL based DBMS.

  9. Eric Barnes says:

    Thanks EM!
    I haven’t used fortran since college, but have used sql and c alot. You inspired me to put GHCN into mysql and I’ve just recently finished.converting the all daily into a single table. I mapped the 31 values and flags columns to a single value,mflag,qflag values plus a new day column.

    After conversion, I’ve just managed to test a few queries like getting a count of the total number of measurements (600,000,000+). I also averaged tmax and tmin for all thermometers for each year. Each query requires a full table scan, but only takes about 15 minutes (I’ve got the time :) ).

    My machine is no top of the line system. optiplex 330 (about 5 years old) with a dual core and a couple of nice decent sata 7200 rpm hd’s.

    IMO, it should make it pretty easy to do some interesting queries.

    I’ll post the code soon.

    I was wondering if you still had a copy of the ghcn v1,v2 daily? I thought that would be nice to convert/compare in sql as well.

    Thanks again!
    Eric

  10. Eric Barnes says:

    Here it is…

    To get up and running (you need mysql 5.1 and a recent gcc) …
    * download http://www.ewodarz.net/blog/?attachment_id=82 and save to prog.zip (on a filesystem w/ more than 100 gig or so).
    * download ghcn daily from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd_all.tar.gz to the same directory.
    * untar and unzip ghcnd_all.tar.gz.
    * unzip prog.zip
    * cd to prog subdirectory and type make (creates exes to load convert data).
    * type make install (to install programs to ghcnd_all directory).
    * cd to test and run bash create_user.bsh (creates ghcn user).
    * run create_db.bsh
    * cd to the ghcnd_all directory.
    * run ‘bash dly_to_csv.bsh’ (to convert to csv).

    You can let that run in the background as it takes a while or you can …
    * open another window , cd to ghcnd_all and run ‘bash load_ghcn_csv.bsh’. (this loads into they daily table).

    The load should patiently wait if it runs out of things to do.

    Currently it’s just the daily data and it’s partitioned on the year field. So running queries without specifying a narrow constraint for years will not be snappy.

    I haven’t worked out the best way to create indexes or create efficient joins yet. Make sure you have a decent sized /tmp if you do as mysql will use the system /tmp and you could cause some problems.

    I thought I could create summary tables from the daily table, but this hasn’t worked well in practice. I think just creating summaries as part of the conversion and then cross checking will work better.

    All data is loaded, so check the qflag for emptystring (others have failed qa).

    It’d be nice to be able to udpate the stations table to put each station in a grid as well. That would make for interesting checks vs. anomaly maps (http://www.ncdc.noaa.gov/ghcnm/maps.php).

    Below are a couple of queries I’ve run. I’m looking forward to closely examining *all* raw TMAX and TMIN’s w/o the anomaly business getting in the way.

    Thanks!

    mysql> select count(*),avg(cast( value as decimal)),stddev(cast(value as decimal)),month from daily where year=1998 and value > -1200 and qflag=” and element=’TMIN’ and id like ‘US%’ group by month;
    +———-+——————————+——————————–+——-+
    | count(*) | avg(cast( value as decimal)) | stddev(cast(value as decimal)) | month |
    +———-+——————————+——————————–+——-+
    | 168827 | -35.3440 | 91.9569 | 1 |
    | 153689 | -12.4339 | 68.9321 | 2 |
    | 170561 | -4.7525 | 88.4205 | 3 |
    | 166320 | 41.0802 | 61.5179 | 4 |
    | 171210 | 104.5698 | 63.9800 | 5 |
    | 166853 | 137.6209 | 68.1672 | 6 |
    | 172616 | 170.7749 | 52.4619 | 7 |
    | 172830 | 160.9331 | 54.2364 | 8 |
    | 168381 | 132.6620 | 63.1411 | 9 |
    | 173244 | 64.0513 | 69.0841 | 10 |
    | 168095 | 18.3098 | 68.7058 | 11 |
    | 174087 | -34.8607 | 97.7308 | 12 |
    +———-+——————————+——————————–+——-+
    12 rows in set (0.00 sec)

    mysql> select count(*),avg(cast( value as decimal)),stddev(cast(value as decimal)),month from daily where year=1988 and value > -1200 and qflag=” and element=’TMIN’ and id like ‘US%’ group by month;
    +———-+——————————+——————————–+——-+
    | count(*) | avg(cast( value as decimal)) | stddev(cast(value as decimal)) | month |
    +———-+——————————+——————————–+——-+
    | 159943 | -80.7173 | 100.6287 | 1 |
    | 151124 | -58.1853 | 96.6232 | 2 |
    | 162197 | -10.7311 | 77.3192 | 3 |
    | 156697 | 36.7605 | 64.2804 | 4 |
    | 162952 | 88.2646 | 59.9796 | 5 |
    | 158370 | 135.6480 | 58.9953 | 6 |
    | 164228 | 162.0326 | 54.6689 | 7 |
    | 164184 | 157.8656 | 61.7584 | 8 |
    | 158149 | 108.2555 | 66.8249 | 9 |
    | 163402 | 46.0224 | 68.6338 | 10 |
    | 157272 | 4.6064 | 77.0701 | 11 |
    | 162543 | -51.9439 | 85.6212 | 12 |
    +———-+——————————+——————————–+——-+
    12 rows in set (5.66 sec)

    mysql> select count(*),avg(cast( value as decimal)),stddev(cast(value as decimal)),month from daily where year >1990 and year -1200 and qflag=” and element=’TMIN’ and id like ‘US%’ group by month;
    +———-+——————————+——————————–+——-+
    | count(*) | avg(cast( value as decimal)) | stddev(cast(value as decimal)) | month |
    +———-+——————————+——————————–+——-+
    | 1477796 | -58.6676 | 100.1472 | 1 |
    | 1353186 | -36.3229 | 92.5230 | 2 |
    | 1488313 | -5.5229 | 82.5623 | 3 |
    | 1441267 | 39.2154 | 69.3448 | 4 |
    | 1492060 | 92.7311 | 64.4182 | 5 |
    | 1446008 | 136.8531 | 61.0062 | 6 |
    | 1497848 | 161.4506 | 55.9534 | 7 |
    | 1497012 | 153.9540 | 55.5807 | 8 |
    | 1446825 | 112.9518 | 66.9215 | 9 |
    | 1490901 | 53.6368 | 72.1462 | 10 |
    | 1440235 | -1.2899 | 79.3505 | 11 |
    | 1486581 | -38.8070 | 87.6688 | 12 |
    +———-+——————————+——————————–+——-+
    12 rows in set (55.89 sec)

    mysql> select count(*),avg(cast( value as decimal)),stddev(cast(value as decimal)),month from daily where year = 2000 and value > -1200 and qflag=” and element=’TMIN’ group by month;
    +———-+——————————+——————————–+——-+
    | count(*) | avg(cast( value as decimal)) | stddev(cast(value as decimal)) | month |
    +———-+——————————+——————————–+——-+
    | 288415 | -46.0407 | 130.5425 | 1 |
    | 272334 | -17.3569 | 119.9765 | 2 |
    | 293871 | 15.5532 | 98.0108 | 3 |
    | 283491 | 45.4710 | 80.7160 | 4 |
    | 294951 | 93.0467 | 71.7132 | 5 |
    | 284205 | 124.5165 | 65.2801 | 6 |
    | 292818 | 144.5936 | 61.1889 | 7 |
    | 290598 | 143.1116 | 63.5586 | 8 |
    | 280848 | 104.6455 | 72.4132 | 9 |
    | 290554 | 62.6937 | 79.2161 | 10 |
    | 279668 | 1.4682 | 104.5092 | 11 |
    | 285749 | -53.6444 | 129.8348 | 12 |
    +———-+——————————+——————————–+——-+
    12 rows in set (7.04 sec)

    mysql> quit

  11. Pingback: My Report on v1 vs v3 GHCN | Musings from the Chiefio

  12. Eric Barnes says:

    Have you thought of doing dTdt on the daily data? There are more stations,and it might shed more light on the process for picking stations in the monthly subset.

  13. E.M.Smith says:

    @Eric Barnes:

    Sounds like an interesting idea. It’s a very large data download, but done at night…. hmmm….

    Unfortunately, I have not downloaded / saved any daily data. Perhaps The Wayback Machine?

    FWIW, you ought to be able to implement the dT/dt or dP/dt method easily (nearly trivially) in SQL. (The only difference between the two is the sign on the difference). For each thermometer, take the most recent monthly value, hold on to it and look for the next value. IF the next value backward in time is missing (-9999 flag or just no record) skip to the one after that (repeat until next oldest data found or out of data). When the next value is found, subtract it from the present value. That’s your difference – dT. Also keep a running total of those values. (dP/dt or dT/dt depending on add vs subtract). Hold onto the (now) current value for temperature and repeat until out of data. On end of data, handle the last record and proceed to the next WMO station.

    Basically it is First Differences, but run from the present into the past (rather than the usual ‘from the past into the present’) and instead of doing a reset on missing data (that introduces a load of splice artifacts – on every data missing item). I also do the First Differences on each monthly data series independently. The thesis being that June can be compared to June of any year for a given instrument. If there are 5 missing June entries in a row, what does it matter to the long term trend? If June now is 20 and June 6 years ago was 20, not much really changed, and if it is 20 now and was 19 6 years ago, then it’s moved up 1/6 per year on average. This turns data dropouts into “flat segments” rather than into “reset and take a splice artifact”. Given the amount of data dropouts in the GHCH this is the better method. It gives a continuous series from “now” to the very first entry for every thermometer that avoids any splicing… In that way, the only “end effects” are the very first and very last data item.

    I’ve thought of adding some kind of “shaping their ends” to add a “final anchor” at each end set to the the average of, say, the last 10 years data items – just to remove accidental “cherry picking” by end year variations, but haven’t done it. ( For the simple reason that it IS an artifice and I’m trying to just ask the data what it has to say, not fix it…)

    Well, perhaps you have inspired me. I’m feeling like installing Mysql all of a sudden…. ( I started my career as a Data Base Guy using HP IMAGE and QUERY, then moving into RAMIS II and FOCUS, and eventually getting 3 databases ported to Amdahl’s mainframe Unix UTS – Universal Timesharing System. Oracle, Ingress, and Informix. I’ve written a bit of SQL too.

    Mostly just got stuck in FORTRAN as GIStemp was written in it and I used a lot of the same code in my GIStemp work. ( It isn’t a very hard language to use, the data formats were already FORTRAN friendly having been written by FORTRAN, and I had the compiler installed anyway…) So mostly just sloth. I looked at putting an SQL server on the laptop, but it took longer to just “pick one” trying to figure out which one had what features than it took to just write the next program in FORTRAN.

    Don’t know when I’ll get to it, but I’m pretty sure a MySQL install is in my future…

  14. Eric Barnes says:

    Thanks EM,
    I think the dT/dt and dP/dt analysis is pretty devastating.
    I’m very interested in building an audit of the raw daily data. I do something very similar at my work. We have a transactional db which feeds a warehouse db nightly. As part of verification of the dw data, we wrote a program w/ sql statements that does a sort of checksum total (record counts and sums) for a number of tables with accounting information and fiscal years/periods/etc. If something is off it shows up very quickly. Added on is another set of sql’s that will go find the exact rows that were modified, added/deleted/etc in the transactional db.
    Doing something like this for the daily data would be very helpful (for me at least).
    As applied to GHCN, it would be neat to systematize this and build reports around differences between releases of the daily data. It would help identify what are possible mistakes and where modifications are and their scale as well. It’d be nice to be able to rapidly identify the source of changes like ..
    http://stevengoddard.wordpress.com/2012/05/11/hansen-covering-his-tracks-in-iceland/

    I did have a question that I’m not quite able to clear up on my own. Is it true that the daily data is never to be modified (the input as I understand it) and only ghcnv1, v2, v3 source code is modified to adjust temperature summaries (the output)?

  15. Eric Barnes says:

    IMO, which computer language is used to analyze/solve problems is purely a matter of personal taste. The end result that you have is devastating to those who want the use ghcn v3 as evidence of catastrophic warming.
    Part of my current job is verifying that data from from a transactional database is moved correctly into a warehouse database. To verify, total sums and record counts are calculated for whole tables broken down by fiscal year and period. An html report is run daily that highlights any discrepancies. The html report contains hrefs to urls that point to an html service that will calculate what records are off based on input of year, period, fund, account, etc. Applying this sort of mechanism to the monthly release of ghcn data would be very helpful IMO. Queries/functions could be designed to find the cause differences, their magnitudes, etc.
    It would also be nice to completely pin down the raw data with an ability to determine any changes to it.
    Thanks again for your excellent work.

  16. Eric Barnes says:

    Sorry about the double post. I posted one at work and another when I got home. It didn’t show up at home (I thought it was lost).

  17. E.M.Smith says:

    @Eric Barnes:

    I don’t know how much the daily data are changed. There are so many layers of folks fiddling bits that every time I thought I had the “raw” data, turned out it was further up stream… YMMV…

    Don’t worry about double posts, I can delete one or just ignore it… It happens…I’ve done it…

    Yes, “which language” is largely a “what am I comfortable using”. For example, FORTRAN has very nice built in fixed format data handling. C is rather primitive in that regard. But someone who knows C and writes it a lot can knock out the fixed format routine calls pretty quick. I’ve done it, but it was a few years back, so would take me a day or two to ‘refresh’.

    Similarly, I can do variable format reads in FORTRAN but it’s a bit of a pain. FORTRAN has some very nice “type checking” built in that saves folks from many bugs (like that overflow on an out of range data item). Yet there are times that wearing that strait jacket is more trouble than feature. If you need to cast a char into an int, C is easier. BUT, you can do it in both and what you know is more important than how the language makes it harder or easier.

    So I generally just pick up whatever language is already in use in any package or place and just spend a day coming up to speed in it.

    One thing to note is that prior to v3 there was no versioning on the data updates. Some data might come in for a station a month or two in the past. I explored those “zombie stations” in a posting or two here. Now, in v3, there are fields that look to have some kind of timing data. Don’t know if it is being used, but theres some hope.

    https://chiefio.wordpress.com/2010/02/15/thermometer-zombie-walk/

    At any rate, a dramatically needed thing is just that kind of “revision control” on the data set. At least last time I looked, data just sort of flowed into the GHCN on a haphazard basis and you grabbed a copy at some point in time and it matched nobody else’s copy… No golden master. No ‘release date’. No revision numbers… All the things professionals in business do as a matter of course were just not there.

    I hope they’ve put some in place, but have not taken the time to look.

    FWIW, as my reason for using monthly data instead of daily comes directly from the fact that it is what GISTemp uses. So I just wanted to check what was going into it. At this point, it would likely be better to use a different data input for ‘compare and contrast’. Something like daily vs monthly or something like Best vs GHCN.

  18. Eric Barnes says:

    OK,
    My understanding is that
    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd_all.tar.gz
    contains all data, and no averaging/summing/homogenization.
    So, if you have v1 gistemp (which has the station list in it, monthly tmax’s, tmins and means), using sql, it should be fairly easy to summarize trend differences/summaries of stations in both v1 and v3 , only v1, and only v3. That would be interesting.
    On a related note, it seems Mosher went to great lengths to miss the point of your argument.
    Another interesting topic is that Steven Goddard was looking at the historical changes made to GISTemp in the late 1990s. I think he’d be very interested in the list of stations used for v1 GISTemp. Using that and ghcnd_all, he’d be able to break down the differences for that rather large change. Here’s that post.
    http://stevengoddard.wordpress.com/2012/06/23/what-hansen-is-hiding/?replytocom=96103#respond

  19. E.M.Smith says:

    @Eric Barnes:

    Yeah, spent most of today just saying the same thing I’ve been saying on the original thread. The whys and hows and why not just like the other guy…

    My impression was that they ran to “the usual defense” of “look, the grid box anomalies don’t change” while not noticing that many of the complaints were either about things I don’t actually do or about things that are a feature, not a problem…

    Interesting link…

  20. Chuckles says:

    They did indeed run the usual smokescreen E.M., and they most certainly do notice that it has no relevance to what you were saying.
    Noting that the quality of the base data is poor is not allowed, not is the sort of analysis you were doing permitted, and always results in the knee jerk trotting out of the Z & S show telling us how similar all the reconstructions are, and how the ‘confirm’ each other, and nothing to see here move along, all good….

    Trouble is the usual hand-waving about ‘independently verified’ etc has nothing to do with the analyses you were doing, or with the massive uncertainties in the initial data, the gaps and all the other stuff you noted.

    And enough people have now seen that dog and pony show often enough that they’re tired of it.
    Had to love the efforts of the faithful as well, with the condescension and berating of the sinners and unbelievers for their apostasy, and the ‘I work 80+ hours a week…’.

    Funny how it never occurs to them that if all the polls show that across societies, and particularly when the man in the street is included, the majority do not believe AGW is a problem that needs addressing urgently. With that as a given, calling people names and talking down to them does not seem like a great way to convert them to the cause.

    On the initial ‘whole degrees’ truncation/rounding of temp data at acquisition time, you might want to mentally note that in the later max/min data, BOTH the max and min temp were similarly truncated.
    When the daily temp is calculated with (max+min)/2, it means that both the top terms have a 1 degree uncertainty that cannot be recovered….
    The monthly average is, of course, reported to 1 decimal.
    This can be seen on any WS form 6, e.g. here

    http://www.arh.noaa.gov/wmofcst_pf.php?wmo=CXAK56PAOM&type=public

    http://www.nws.noaa.gov/climate/f6.php?wfo=gld

  21. Chuckles says:

    Ah now we’re into the usual ‘denying radiative physics’ meme over there.
    End times people end times.

  22. E.M.Smith says:

    @Chuckles:

    Yeah, I ignored the ‘radiative physics’ derailer. I’m tired of the “If we ignore convection and evaporation / condensation then radiation dominates” argument ;-)

    Gee, I had the same feeling of “usual smokescreen”. Seen that argument set too many times, I guess….

    I figured the “I work 80+ and Smith is wearing his bias on his sleeve” guy was just a muckraker trying to turn it into a mud sling distraction, so just let it pass. Don’t really care what he thinks about me anyway. Better to just let him lay in the insult slime pit and not jump in to join ;-)

    No matter how often I say it, some class of folks just don’t “get it”. “It’s not about ME. -E.M.Smith” Doesn’t matter who I am. Doesn’t matter what I say. Doesn’t matter what I want or what I “believe” in. All that matters in science is the data, the method, and the result.

    The data is give. The method and code are published. The result is clear. They can’t attack the data, as it is their data. The result just is what it is. Only leaves “attack the method” or “distract and strawman”. Haven’t seen evidence that any of them have actually gone through the code, nor (other than a sneer about the ‘skip the gap’ variation on FD not being peer reviewed) has there been any evidence that they looked at the method. Certainly no complaints about the code (so either it has no errors or – more likely – nobody has looked at or run it…)

    So we get the Strawman that what I didn’t do is wrong; that I ought to have done it the same way everyone else does it – which is the only valid way; and ad hominem attacks.

    Aw well, makes for a good show I suppose…

    On the Min / Max thing: You are right! (Now why didn’t I think of that? ;-)

  23. Eric Barnes says:

    So I was working on getting the ghcn daily records (ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/) into a mysql database and running some quality control queries. I was looking at brazil and noticed what seemed to be an error. Only around 50% of the days in a calendar year reported TMAX and TMIN values (where qflag is not set). I thought there must have been something wrong with the conversion, but no. The *best* stations were missing a large percentage of tmax,tmin values. Below is the results of a query to select the stations with the most valid measurements per year, ordered by the number of measurements descending (from 1960 to present). First number being the count of valid tmax,tmin records, 2nd value being an avg sanity check, the third the station id and the 4th the year.
    Is it known that the measurements are that sparse? For a country essentially as large as the US? I’d have thought that if a station was reporting below 90%, it wouldn’t be included. That assumption is obviously false.

    count(*)	avg((valuemax+valuemin)/2)	id	year
    314	268.55891720	BR001065002	2009
    274	266.38868613	BR001065002	2002
    260	266.18461538	BR001065002	2007
    259	269.91505792	BR001065002	1995
    255	264.35882353	BR001065002	2003
    251	222.29083665	BR00D6-0010	1979
    244	231.27049180	BR00D6-0010	1980
    242	260.71074380	BR001065002	2004
    242	226.46694215	BR00D6-0010	1981
    241	267.80082988	BR000082400	1979
    240	254.60416667	BR048620980	1979
    237	271.43459916	BR027510430	1979
    237	265.75949367	BR000082400	1980
    237	266.26793249	BR001065002	1996
    231	244.56709957	BR000083743	1981
    223	269.66591928	BR001065002	1994
    220	272.13636364	BR028724960	1979
    214	225.39719626	BR057324290	1981
    213	273.70892019	BR000083361	1979
    213	265.51643192	BR001065002	2008
    208	264.51923077	BR000082861	1979
    206	269.02184466	BR001065002	2010
    205	269.56097561	BR000083361	1981
    205	267.46341463	BR001065002	1993
    203	276.99507389	BR037041390	1979
    200	264.95000000	BR038837530	1979
    199	268.09296482	BR001065002	2005
    198	237.14646465	BR000083743	1979
    198	267.29797980	BR000082400	1981
    190	203.55263158	BR002047016	1979
    
  24. E.M.Smith says:

    @Eric Barnes:

    Wow! Who knew….

    So, you got this one (so I don’t need to take it on)? Happy to put up interesting findings in a Guest Posting…

    BTW, by putting <pre> before a block
    you can get a ‘preformatted’ chart.
    Like I’ve done with the chart in your comment.
    It is ended with </pre>

  25. Eric Barnes says:

    Thanks for the tip!

    Also, while I am interested in getting the database, I’m afraid it might not be very timely as I’m painfully short of free time. It may literally be a month of sundays before I’m really done.

    Thanks!

  26. Eric Barnes says:

    This paper nicely describes the difference between monthly and daily data sets. Its making for good reading,:)

    http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/papers/menne-durre-etalsubmitted.pdf

Comments are closed.