Will The Good GHCN Stations Please Stand Up

How To Select Long Lived GHCN Records

In constructing a benchmark to test GIStemp, I discovered that the “warming signal” goes away when the thermometers used are restricted to those that have a “long lifetime”. AGW is an artifact of the arrival and exit of thermometers from the data set.

This posting is a technical posting. It is just for the purpose of publishing the computer code that I used to do this thermometer selecting.

Realize that this code can be run on the GHCN data from the step before GIStemp is run. The AGW as thermometer count artifact is a function of the GHCN data, not of GIStemp itself.

OK, with that said, here are the tools I constructed to get that reduced set of records.

The Top Script

At the top level, there is a simple script that coordinates the action and makes sure that the right programs are run in the right order on the right data. This can have any name you like. I have a couple of versions used for testing different things. This example is from a folder that is a couple of levels down from the STEP0 level, so the input file is named with ../../STEP0 to get you back up to where you can find the STEP0 directory. This version, I named “dotops”.

The basic process is to fetch the GHCN data, sort the records by Station ID. Count up how many records each station has. Sort the list by that count, then chop off the number we want to keep. This list then drives a program that selects all records for those stations from the data set. Finally, the data are re-sorted into the order GIStemp expects.

echo "We select only the longest lived stations"
echo " "
echo "# sort the v2 file by station ID"
echo " "
ls -l ${1-../../STEP0/input_files/v2.mean}
echo " "

sort -n -k1.4,1.12 ${1-"../../STEP0/input_files/v2.mean"} > 
${2-"v2.sort.id"}

echo " "
ls -l v2.sort.id
echo " "

echo "Feed the v2.sort.id list into a count of records by ID." 

bin/countids  v2.sort.id

echo " "
ls -l v2.sort.id.count
echo " "

echo "pick the top number, presently ${3-3k}, of stations," 
echo "put them into TopStations."

sort -rn v2.sort.id.count | head -${3-3000} > TopStations

echo " "
ls -l TopStations
echo " "

echo " Put those Top Stations in sorted order by ID"

sort -k2.1,2.10  TopS.by.id

echo " "
ls -l TopS.by.id
echo " "

echo "Then feed the v2.ghcn data, sorted by station ID, and "
echo "TopS.by.id to the program that picks out all Top station "
echo "records in the data set."

bin/selectrec v2.sort.id TopS.by.id 

echo " "
echo "Producing the list of v2.ghcn records for the" 
echo "longest lived stations."
echo " "
ls -l v2.sort.id.select
echo " "

echo "Then sort the Special selected stations version of v2.mean "
echo "into a version for reporting."

echo " "
echo "Re-sorting the selected records back into year order"
echo " "

sort -n -k1.13,1.16 v2.sort.id.select  > v2.Special

ls -l v2.Special

echo " "
echo "Doing total GAT: totalavg"
echo " "
echo "This produces an average of all the data, by month, using"
echo "a couple of different methods to illustrate the impact of" 
echo "coding choices, and the merit of choosing the second method"
echo "over the others."
echo " "

bin/totghcn v2.Special

echo " "
echo "Doing GAT Yearlies: yearlyavg"
echo " "
echo "This program produces a list of Global Average Temperatures "
echo "for each year for each individual month of the year.  "
echo "The station record temperatures are added together, then "
echo "divided by the number of valid records for that sum, giving"
echo "the average of global temperature readings for that year/month."
echo " "

bin/yearsghcn v2.Special

Notice that this script is fairly “chatty”. It serves as much as documentation as it does as a functional bit of code. The script also often spits out an “ls -l” of the intermediate files. This is so that you can monitor how the process is going and see how big the files are getting. Completely optional.

There are several places where the Unix / Linux command to sort files is called directly. This code could easily be made much “tighter” by using fewer files and doing more coding inside the programs; but for a rapid prototyping hand tool, using built in tools like “sort” can make for a faster build / test time.

The last two steps, running totghcn and yearsghcn will not be described in this posting. Those are two tools that “characterize” the data. They will be described in postings that talk about the character of the data. This posting is about how a reduced set of more persistent thermometers was selected.

OK, the “top script” starts off with fetching the GHCN data from where GIStemp keeps it. You could just as easily fetch it directly from NOAA and put it directly into your analysis directory. Notice that this file name is set as a default, but can be overridden with a parameter. “dotops v2.mean” would expect to find a copy of the GHCN data in your analysis directory with the name v2.mean (rather than wandering off to find it in GIStemp directories).

All fairly simple.

So what is in the programs that are called?

Countids

This program listing is mostly comments. The actual code is pretty simple and fairly small.

[chiefio@tubularbells vetted]$ cat src/countids.f 
C     Author:  E. Michael Smith
C     Date:    July 31, 2009
C     Program: countids.f
C     Function:  Take a GHCN format data file and count the number of 
C     data records in that file for each unique Station ID. 

C     Fortran has defind uses for field postions that can cause bugs if you
C     are not aware of them.  I put in this "ruler" to keep awareness of that
C     fact.  1-5 are statement labels, 6 is a continuation flag, then
C     7-72 are program code, and the last digits to column 80 are "card"
C     serial numbers (so you can "reorder" your deck of punched cards if
C     you dropped it...  that is, a sort key for your punched card deck.)
C     A very common form of bug comes from having a variable name that 
C     extends into the card number field, so, for example, the name countids
C     would become "coun".  This can cause no end of greif, especially
C     if you have many variable names that only change in the last few
C     characters.  year, year1, year12.  Prune one by accident and now
C     you are actually using a different variable in your calculations...
C     http://www.math-cs.gordon.edu/courses/cs323/FORTRAN/fortran.html
C     is helpful.

C2345*7891         2         3         4         5         6         712sssssss8

      integer itmp(12), icc, iyr
      character*128 line, oline
      character*9 id, idold

C     Declare your variables and initialize them.  Some compilers will
C     start your variables at random values...  I found that g95 starts
C     large arrays with random values.  It's "good practice" to assure
C     that all variables are declared and initialized (though FORTRAN
C     will let you implicitly declare a simple variable by using it, 
C     with those starting with i,j,k,l,m, or n being INtegers and the 
C     others being floating point.)

      data itmp    /0,0,0,0,0,0,0,0,0,0,0,0/
      icc=0 
      id=""
      idold=""
      idcount=0
      iyr=0 

C     Get the name of the input file, in GHCN format.  The file must be
C     sorted by ID (since we count them in order.)
C     The name of the output file will be: inputfile.count
C     "trim" is in f95, but not f77.  "//" is concatenate.
C     some FORTRAN releases let you embed comments with !comment syntax

      call getarg(1,line)
      oline=trim(line)//".count"

      open(1,file=line,form='formatted')
      open(10,file=oline,form='formatted')              ! output

C     Read in a line of data (Country Code, ID, year, temperatures)
C     picking out the station ID.  We then "loop" as long as the
C     station ID has not changed, making a running count of 
C     records for that station ID.  When we run out of input file,
C     we go to statment number 200 and finish up.
C     One could ommit reading icc, iry, itmp but it doesn't cost
C     much processing to have them available for debugging or to 
C     cause an error condition if they contain bogus values.  That is,
C     reading in these variables is a kind of crude test for file
C     format being more or less right. One could add code to bounds
C     test them, if desired.

      read(1,'(i3,a9,i4,12i5)',end=200) icc,id,iyr,itmp
      idold=id
      goto 30

   20 read(1,'(i3,a9,i4,12i5)',end=200) icc,id,iyr,itmp

   30 if(id.eq.idold) then
        idcount=idcount+1
        goto 20
      end if

C     If the ID changed, we drop down into the code that writes out the
C     station ID and count of records for that ID.
C     The commented out "write(*,*) line lets you see what is happening
C     on your screen, if you un-comment it.

      if(id.gt.idold) then
C        write(*,*) idcount, idold, icc, id, iry itmp
        write(10,'(i6," ",a9)') idcount, idold
        idcount=1
        idold=id
        goto 20
      end if

C     This code exists just to make sure you knew to sort the input file
C     file by Station ID if run by hand.

      if(id.lt.idold) then
        write(*,*) "Error in countids:  idold > id; unsorted file"
        goto 300
      end if

C     This may look like paranoia, and maybe it is, but on more than
C     one occasion I've had "impossible code" execute.  So I often put
C     an "error catcher" such as this in programs, just to make life
C     easier when the impossible error happens...

      write(*,*) "Compiler Error:  You can't get here!"
      goto 20

  200 continue
C     we have one last record in the queue to print out in the 'END' case.

      write(10,'(i6," ",a9)') idcount, idold
      STOP

  300 STOP "ERROR!"
      END

Select Records

This program takes the list of selected high roller stations and uses it to filter their records into the working data set (and filter the “short record” stations out).

C     Program:  selectrec.f
C     Written:  July 31, 2009
C     Author:   E. M. Smith
C     Function: This program takes a list of "high roller" Station IDs
C     and uses that list to select the records in a GHCN format data file
C     that will be passed on for further processing.
C
C     Copyright (c) 2009
C     This program is free software; you can redistribute it and/or modify
C     it under the terms of the GNU General Public License as published by
C     the Free Software Foundation; either version 2, or (at your option)
C     any later version.
C
C     This program is distributed in the hope that it will be useful,
C     but WITHOUT ANY WARRANTY; without even the implied warranty of
C     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
C     GNU General Public License for more details.

C     General Housekeeping.  Declare and initialize variables.
C
C     itmp    - an array of 12 monthly average temperatures for any
C               given record.  The GHCN average temp for that station/year.
C     icc     - the "country code"
C     iyr     - The year for a given set of monthly data (temp averages).
C     idcount - the count of valid data items falling in a given month
C               for all years.  An array of months-counts with valid data.
C     sid     - Station ID for the idcount.
C     line    - A text buffer to hold the file name of the input file
C               to be processed, passed in as an aguement at run time.
C     line2   - A text buffer for the file name of Station IDs to process.
C     oline   - A text buffer for the output file name, set to the input
C               file name.select
C     id      - Staion ID in the input data.
C     buff    - A text buffer for holding input / output record data.

C2345*7891         2         3         4         5         6         712sssssss8

      integer itmp(12), icc, iyr, idcount
      character*128 line, line2, oline
      character*9 id, sid 
      character*64 buff

      data itmp    /0,0,0,0,0,0,0,0,0,0,0,0/
      data buff /"                                                  "/
      icc=0 
      id=""
      sid=""
      idcount=1
      iyr=0 

C     Get the name of the input file, in GHCN format.  The file must be
C     sorted by ID (since we count them in order.)
C     The name of the output file will be inputfile.select

      call getarg(1,line)
      call getarg(2,line2)
      oline=trim(line)//".select"

C     lines of the form "write(*,*)" are for diagnostic purposes, if desired. 
C      write(*,*) oline

      open(1,file=line,form='formatted')
      open(2,file=line2,form='formatted')
      open(10,file=oline,form='formatted')              ! output 

C      write(*,*) line2

      read(2,'(i6,1x,a9)',end=500) idcount, sid
   20 read(1,'(i3,a9,a64)',end=400) icc,id,buff
C      write(*,*) idcount, sid, id

   40 continue
      if(id.eq.sid) goto 70
      if(id.gt.sid) goto 80
      if(id.lt.sid) goto 90

C      write(*,*) "Compiler error!.  You can't get here!"

   70 DO WHILE (id .eq. sid) 
         write(10,'(i3,a9,a64)') icc,id,buff
         read(1,'(i3,a9,a64)',end=200) icc,id,buff
      END DO
      goto 40

   80 DO WHILE (id .gt. sid) 
         read(2,'(i6,1x,a9)',end=300) idcount, sid
      END DO
      goto 40

   90 DO WHILE (id .lt. sid) 
         read(1,'(i3,a9,a64)',end=200) icc,id,buff
      END DO
      goto 40

  200 continue
      STOP "Out of Station Records - ought to be rare!"

  300 continue
      STOP "Out of High Roller IDS.  Most likely case."

  400 STOP "Input file 1 blank on first record!"
  500 STOP "Input file 2 blank on first record!"
      END

So there you have it. I’m sure there are faster, better, more direct ways to do the same thing. These are just the hand tools I wrote as I needed them. I’ve translated at least one of them into “C” just to make sure I can read FORTRAN created files with “C” without too much pain and suffering. If anyone really wants these in “C”, let me know and I’ll produce them. As it is, if you are wrapped up in GIStemp anyway, your brain will be wrapped around FORTRAN already, and it’s easier to just keep one language in your head at a time.

“Someday” – probably measured in a year or three from now, I’ll translate GIStemp to “C” and at that time these hand tools will go with it. But for now, I’m trying to keep things close to the original GIStemp while I characterize it and describe what it does.

FWIW, you can run this code with different “cut off” values for station count. I’ve run it with 4000 and the results are more or less the same. The line that sorts v2.sort.id.count could also be used to produce a list of the stations that do contribute to the warming signal. that is, you could do a “sort -n v2.sort.id.count | head -9000 ” instead and see the long list of stations with just a few years of records.

At some future date, if nobody else does it first, I’ll most likely slog through those station ID’s and see if I can figure out a pattern to them. Are they USSR stations in Siberia, now gone? Tropical / Middle Eastern stations just joining the record in the last few decades as nations reach modernity and can afford things like temperature reporting systems? I’m sure it would be very enlightening…

Should anyone wish to make decade long Global Averages of Thermometers, this is one of the programs I use to do that. This one runs on the raw GHCN file. I have another one that only looks at 1880 onward for use on GIStemp GHCN files. (At some point, I hope to combine them into one program by ripping out the fixed years and putting in parameter driven or data driven behaviours, but for now this is a “hand tool” that works.

C2345*7890         2         3         4         5         6         712sssssss8

C     We keep a count of valid records read, by month, in the 
C     incount integer.  We do the same thing in nncount for a 
C     longer time period.  The running total of temperatures
C     is kept in itmptot and ntmptot.  The individual temperature
C     record read in, is in itmp as an integer representing temperature
C     in 1/10 C increments (GHCN format data).
C     the year of the record is iyr, while iyrmax, nyr, kyr are used
C     to track the number of years in any given summation interval or
C     to control the change of a summation interval when a new year of
C     data are read in.

      integer incount(12), itmptot(12), nncount(12), ntmptot(12)
      integer itmp(12), icc, id, iyr, nyr, iyrmax, m, iyc, kyr 

C     The averages of temperatures are placed into a "real" type 
C     variable.  I keep several types of total and averages in this
C     program as a way of characterizing the sensitivity of the data
C     to particular ways of calculating.  "count" is a real data
C     variable where I can "cast" the integer icount into a real
C     before doing math.  This lets me control when I get the conversion
C     from an integer to a real more directly.  That is, I can stop
C     the FORTRAN "truncate on integer divide" if I so choose.
C     The grand total average goes into ttmpavg, while motav is used
C     for a grand total and testr is a test of another way of getting the 
C     same number, but from an array (for future statistics generation).
C     ktmpavg (and the matching kgavg for grand total average) are
C     where individual decade averages are calculated.
C     tr is an array of temperatures (by month, by years) to be used to
C     do trend analysis in a future revision of this code.

      real tmpavg(12), gavg, ggavg, ttmpavg(12), count(12), motav(13)
      real tr(13,310), testr(13), ktmpavg(12), kgavg
      character*128 line, oline

C     If you would know your starting value with certainty, you must
C     initialize the variable.  Otherwise you are depending on the
C     guy who wrote the compiler to do it for you...

      data incount /0,0,0,0,0,0,0,0,0,0,0,0/
      data nncount /0,0,0,0,0,0,0,0,0,0,0,0/
      data itmptot /0,0,0,0,0,0,0,0,0,0,0,0/
      data ntmptot /0,0,0,0,0,0,0,0,0,0,0,0/
      data itmp    /0,0,0,0,0,0,0,0,0,0,0,0/
      data tmpavg  /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      data ttmpavg /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      data ktmpavg /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      data motav   /0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0./
      icc=0 
      id=0 
      iyr=0 
      iyc=0
      nyr=0 
      iyrmax=0
      gavg=0.
      kyr=0
      kgavg=0
      kount=0

C     Believe it or not, the program made bogus very large values, 
C     but only in (repeatable) sporadic entries... until I added 
C     this initialization.  So the g95 compiler initialized my small
C     variables for me, but not a large array.  The nyr bounds and 
C     the size of "tr" ought to be variables driven by the data size
C     since now this code will break when enough years accumulate.
C     As a "hand tool" this is OK, but not as a production product.

      do m=1,13
         do nyr=1,310
           tr(m,nyr)=0
         end do 
         testr(m)=0.
         motav(m)=0.
      end do

C     Get the name of the input file, in GHCN format.  The file must 
C     be sorted by year (since we sum all data by month within a year.)
C     The name of the output file will be that of the inputfile.yrs.GAT
C     where GAT stands for Global Average of Temperatures.

      call getarg(1,line)
      oline=trim(line)//".yrs.GAT"
      open(1,file=line,form='formatted')
      open(10,file=oline,form='formatted')              ! output

C     Read in a line of data (Country Code, ID, year, temperatures)

      read(1,'(i3,i9,i4,12i5)',end=200) icc,id,iyr,itmp
      iyrmax=iyr
      rewind 1

C     we are just setting the initial state of the first year.
C     The "iyr" maximum we've found so far.  Then we reset to 
C     the start of the file for our entry into the "read records
C     accumulate running totals by year" code.  

   20 read(1,'(i3,i9,i4,12i5)',end=200) icc,id,iyr,itmp

      if(iyr.gt.iyrmax) then

C      if you have a new year value, you come into this loop, calculate
C      the Monthly Global Average Temperatures, the Yearly GAT for iyrmax
C      print it all out, and move on.
C      if you didn't get a new year, bypass this to the code down below
C      where you increment the running totals of degrees.

C      Here we just turn the actual year we're looking at into a small
C      number for an array index.

        nyr=(iyrmax-1700)

        do m=1,12

          if (incount(m).ne.0) then

C      We keep a running total of tenths of degree C in itmptot, by month.
C      Then we divide this by the integer count of valid records that went
C      into each month.  This truncates the result (I think this is valid, 
C      since we want to know conservatively how much GIStemp warmed the 
C      data not how much my math warms the data ;-)  

C      So we have a "loss" of any precision beyond the "INTEGER" values 
C      being divided, but since they are in 1/10C, we are tossing 1/100C 
C      of False Precision, and nothing more.  THEN we divide by 10. (REAL) 
C      and yield a temperature average for that month for that year (REAL).

C      I could do a 'nint' instead:  nint(itmptot(m)/incount(m)) and get a
C      rounded result rather than truncated, but I doubt if it's really 
C      worth it for a "hand tool" that I'd like to be a conservative one.  
C      If I truncate, then any "warming" of the data is from GIStemp, not 
C      this tool. 

C2345*7890         2         3         4         5         6         712sssssss8

C      so integer divide of integer temp and valid count controlled by
C      parenthesis, then promote to a "real" for a floating point 
C      divide with 10. rather than 10 (that would truncate at whole
C      degrees C and lose data) and put the result into a real
C      type array of average temperatures by month.  Got it?
C      We also stuff a running total of monthly averages into gavg
C      to use in making and average of monthly averages later...
C      and kgavg is a running total of decade length.

            tmpavg(m)=(itmptot(m)/incount(m))/10.
            gavg=gavg+tmpavg(m)
            kgavg=kgavg+tmpavg(m)

C      The integer valid data count is stuffed into a floating point
C      counter (so I can control that type conversion if I want to).
C      and an array of averages by month by year (tr) is loaded with 
C      the average we just calculated.  This program has some "just
C      growed" legacy in it, in that I glued on "features" over time.
C      Some of these could now be pruned or calculated in a single
C      data array.  For now, I'm leaving them as a "cross check" of
C      sorts.  It is also easier to see what is going on if a person
C      is not too cute about how they did things...
C      Notice that here I use "motav" of the 13th month to hold the
C      average of averages rather than another "gavg" type variable.
C      It's a neat trick, but exactly what is a 13th month?  That
C      is the kind of thing that programers like to do, but that make
C      code a bit obscure some times.  So each pass through here for
C      months 1-12 we pack motav(month) with the temperature average
C      but also add that average to the running total in motav(13).

            count(m)=incount(m)
            tr(m,nyr)=tmpavg(m)
            motav(m)=motav(m)+tmpavg(m)
            motav(13)=motav(13)+tmpavg(m)
            ktmpavg(m)=(ktmpavg(m)+tmpavg(m))

C       So we put the average temperature for that year into 
C       a bucket by months.  A running total of the monthly 
C       averages.

C       here is a bit of left over commented out code where I 
C       was testing sensitivity to using a "real" count as the
C       divisor.  Since then, I've moved on to just keeping the
C       running totals in the bucket "until the end".
C            tr(m,nyr)=(tr(m,nyr)/count(m))/10.

          end if
        end do

C       Oh, and we have 12 monthly averages totaled, but need to 
C       devide them by 12 to get an annual average.  Notice that
C       I'm stuffing that annual average of averages into the 
C       13 month of "tr" as well.  I also increase our counter of
C       years so far this decade, kount, by one.

        gavg=gavg/12.
        tr(13,nyr)=gavg
        kount=kount+1

C2345*7890         2         3         4         5         6         712sssssss8

C       Here we write out to our output file the record of the GAT
C       for for this year.  The year is iyrmax (because we've read
C       in a new year for the next record), the 12 monthly averages
C       are in tmpavg, and the annual average of averages is in gavg.

        write(10,'("GAT year: "i4,13f6.1)') iyrmax,tmpavg,gavg

C       If we have done 10 years, call it a decade and print the 
C       decade long average of averages.

        if (mod(iyr,10).eq.0) then
C       if (kount.ge.10) then
             kyr=iyrmax
             do m=1,12
                ktmpavg(m)=ktmpavg(m)/kount
             end do
             kgavg=kgavg/(kount*12)
             write(10,'("DecadeAV: "i4,13f6.1)') kyr,ktmpavg,kgavg
             write(10,'(" ")')
             kount=0
             ktmpavg=0
             kgavg=0
        end if

C     A bit of "left over" instrumentation code.  Left laying about
C     since I'm still changing this program.  As I get a "bright idea"
C     and "try something new", I may need to see what it did to the 
C     variables at this point, so this "dead code" stays here until
C     I'm done with changes.  Then it will get yanked out.
C     If I want just the decades, I can run as is.  If I want the 
C     individual year records, this "write" statement provides them
C     along with a count of years-records used.  Since we have one
C     year-record per station, this ought to be the count of stations
C     that went into making up that total.

C       write(10,'("GAT year: "i4,12f7.2,f7.2,i6)') iyrmax,tmpavg,gavg,
C    *iyc

C      probably paranoia, but we re-zero the monthly arrays of data.
C      
        do m=1,12
          incount(m)=0      
          itmptot(m)=0
          tmpavg(m)=0.
        end do

        gavg=0.
        iyrmax=iyr
        iyc=0

      end if

C     This is where you drop to if the year value did not change.
C     MAIN DATA AVERAGING LOOP

C     So we have a new record for a new year or for the same year.
C     If it is valid data (not a missing data flag) add it to the 
C     running totals and increase the valid data count by one.

C     nyr just turns the calendar year into a smaller array index number.
C     iyc increments our counter of years records in this group.

      nyr=(iyr-1700)
      iyc=iyc+1

      do m=1,12
        if(itmp(m).gt.-9000) then

C       Increment the running total for that year / month

          incount(m)=incount(m)+1
          itmptot(m)=itmptot(m)+itmp(m)

C       We keep the count and the running total of degrees (by month)
C       in integer variables, preserving all the information we can
C       until the moment we calculate the averages.

C       Then add that running total to the running total for all years.
C       So here we are adding the present station/year data to a 
C       running total of individual records.  

          ntmptot(m)=ntmptot(m)+itmp(m)
          nncount(m)=nncount(m)+1

C       Again we have some left over "dead code" from a bit of
C       testing.  Yankable when done "developing", but left here
C       in case I want it during some future development...

C          tr(m,nyr)=tr(m,nyr)+itmp(m)
C        else 
C         write(*,*) "skipped one", itmp(m)
        end if
      end do

C     and go get another record
      goto 20

C     UNTIL we are at the end of the file.
  200 continue

C     OK, we got here because we read the last record in the file, but
C     we still have some "totals to date" to print out.  Do the final
C     calculations and wrap up.

             kyr=iyrmax
             do m=1,12
                ktmpavg(m)=ktmpavg(m)/kount
             end do
             kgavg=kgavg/(kount*12)
             write(10,'("DecadeAV: "i4,12f7.2,f7.2)') kyr,ktmpavg,kgavg

C2345*7890         2         3         4         5         6         712sssssss8

C     Here we take the totals "so far" and divide by the valid record
C     count in each month.   We are calculating the "GAT" by a 
C     couple of different ways to see how it changes.  This one is
C     in some ways the least interesting, a running total of everything.
C     The more you average, the more you hide, and this one is an
C     average of a lot of data...  I do a poor job of reporting when
C     the "count" variables are zero and we don't calculate any value,
C     but that really ought not to happen unless you have a drastically
C     reduced (and potentially broken) test data set.

C     We calculate a GAT for all time by month via a couple of paths.
C     ttmpavg (of month) is based on a running total of all records.
C     motav is based on a running total of all years averages.
C     I ought to verify that an INT is big enough to hold a running total.

      do m=1,12
         if(nncount(m).ne.0) then
           ttmpavg(m)=(ntmptot(m)/nncount(m))/10.
           ggavg=ggavg+ttmpavg(m)

C       For ttmpavg, we take the running total of all records, by month.
C       and ggavg is the grand total of 12 months averages.

         endif
         motav(m)=motav(m)/308.

C    where in this case, we divide by years... for each month.
C    so we have a total of each years average by month, averaged.

      end do

C     And we turn that total of months into a single average.
C     With motav we use the "month 13" bucket to do the same thing.

      ggavg=ggavg/12.
      motav(13)=motav(13)/(12.*308.)

C     Another snippet of instrumentation code.
C      write(10,'("        : "4x,12f7.2,f7.2)') ttmpavg,ggavg

C     What is this "testr" thing?  A work in progress.  Right now,
C     it is just reproducing what motav produces, but in a single
C     run 'at the end' via the tr array.  If it "cross checks" as
C     correct, then I'll be expanding this part to do some "trend"
C     production from the tr array.  For now it's just an example
C     of how you can get the same result by several different ways
C     in a program.
C     OK, we make a running total of the accumulated temperature
C     data from each month/year bucket.  It's the same thing as
C     motav if I did things right.

      do m=1,13
         do nyr=1,308
            testr(m)=testr(m)+tr(m,nyr)
         end do
      end do

C     And since we added 308 years of data together, we need to 
C     divide by the year count to get an average.  This ought to 
C     be fine unless some years have no data.  Unlikely.

      do m=1,13
         testr(m)=testr(m)/308.
      end do
C     Then, at last, we print out 308 years worth of average data
C     by month.  The idea is to eventually make a graph or trend
C     analysis with the tr array.  For now, I'm just inspecting
C     the contents and making sure I understand the data and the 
C     changes being made to it.  Oh, and looking for variables that
C     might be having "rollover" issues from adding too many individual
C     records together.  100 degrees x 12 months x 308 years is
C     369600 and an integer holds ? and rolls over to zero at ??
C     (sometimes integer size is tied to word size on your machine.
C     Not as important now with 32 bit PCs, but it could really bite
C     back in the days of 8 bit microprocessors...)
C     If really worried about it, make this a "LONG INT" or run a 
C     test on your particular computer.  And yes, we could make the 
C     'do loop' run from 1701 to 2009 directly.  I just didn't feel
C     like it.

      do nyr=1,308
      iyr=nyr+1700
      write(*,220) iyr,tr(1,nyr),tr(2,nyr),tr(3,nyr),tr(4,nyr),
     *tr(5,nyr),tr(6,nyr),tr(7,nyr),tr(8,nyr),tr(9,nyr),tr(10,nyr),
     *tr(11,nyr),tr(12,nyr),tr(13,nyr)
 220  format (i4,1x,13f7.2)
      end do

 221  format (5x,13f7.2)

C    This is the test of "did the two ways reach the same end point?" 
C    for motav and testr.  We ought to get two lines of identical 
C    output.  Then, we compare this "total of years averages" with the
C    total of all data averages and observe the sensitivity to the
C    two ways of calculating a "monthly average GAT".

      write(*,221) motav
      write(*,221) testr
      write(*,'(5x,12f7.2,f7.2)') ttmpavg,ggavg

      stop
      end

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in GISStemp Technical and Source Code and tagged , , , , , . Bookmark the permalink.

8 Responses to Will The Good GHCN Stations Please Stand Up

  1. pyromancer76 says:

    More power to you, E.M. Smith. And I hope you’re finding that partner(s) to turn this material into at least one “peer-reviewed” paper. Although, as you develop your GISTemp skills, you may have a series of papers.

    I, too, believe there is no global warming today. It is an artifact of a collision of history (the way things have been done in the past), sloppy maintenance and siting of temperature stations (e.g., Anthony Watts and the surface stations project — those who can’t follow the regulations should be fired); malevolence (marxist-socialists who want to smash the developed-world’s noses in the dirt (e.g., Rev. Wright’s “garlic noses”), and finally religious purity (cult-like beliefs to save Mother Earth from the sins of ____). Fill in the blank — oil corporations, polluters, developers, hunters, people who have babies, etc.).

    Your work is the next step after Anthony’s. Actually, I think the two efforts are complementary. We need those thermometers (temperature stations) identified that are accurate, properly sited, and have been in place and regularly read for a long time. They give valid data to be analyzed by your accurate computer code. Next we must find out if their readings (raw data) have been doctored — not so easy. Remember the broken Hawaii thermometer whose record high temperature was not changed although the broken thermometer was acknowledged, if I remember correctly.

    This is great material. There is no AGW! CO2 takes a hike in the summer! These sound like titles of dazzling papers. I wonder what is next. (I would also like something like GISTemp Shenanigans Deserve a Shellacking with a Shillelagh.)

    And what shall we do with all those scientists who have prostituted themselves and their research subject (and scientific truth) for fame, fortune, or simply plodding along with the group?

  2. Dennis Elliott says:

    Pyromancer,

    I second your kudos to Mr. Smith.

    In terms of firing those who initially collect the data, I suspect firing them would be somewhat of an overkill.

    I worked throughout my career on a succession of U.S. Forest Service Ranger Stations. Until fairly recently (when RAWS automated stations were put in) every one of those stations had an associated weather station. These recorded precipitaion, Max/Min temperatures, humidity and (some of them) wind direction and force and “fire sticks” to assess the dryness of forest fuels. Districts typically had a permanent cadre of just a few people which ballooned to 5-6 times the number during the summer season. Everyone had a number of primary duties and the recording of data from the weather station was a decidedly low priority. Most of our field work had to be accomplished in the summer. Mostly the primary employees did the measurements, but sometimes seaonals did it, sometimes, front office clerks, sometimes family members of the permanent staff; essentially whoever was available. Everyone tried to do a good job, but any time that many people for whom it is “just another job” is involved, there are going to be errors in methodology, understanding, and interest. I believe at least several of these stations produced part of the data Mr. Smith is working with. Other government (other than the F.S., that is) offices worked the same way, I suspect. It wasn’t willful sloppiness and the data we collected was used by us for distinctly different purposes (fire control, snow water content, etc.) and the accuracy obtained suited our purposes in an economical way. No one suspected that that data would ever be used to form the basis for a worldwide scam of epic proportions. Mr. White is perhaps even more right than he knows when he says the precision is far from decimal parts

    Mr. Smith,

    In this regard, I suspect that the Forest Service data collected prior to about the late 50’s was very good and that after that was progressively poorer as the complexity of the primary duties increased. I wonder if that isn’t the case with much of the U.S. record.

  3. E.M.Smith says:

    My complaint about the temperatures collected is not with the folks who did it. They did a reasonable job for their primary purposes.

    My complaint is with the folks who thought they could take a temperature series collected for one set of purposes and torture it into serving another.

    Every airport NEEDS a thermometer and it really ought to be 4 feet directly above the tarmac of the runway. That is where the “density altitude” of the airplane wing will determine if you can lift off, or crash into the end of runway barrier. THAT is not wrong. Trying to turn that “over the runway” temp into a “what’s happening in the forest 1000 km away” temp is the broken bit. Similarly, a USFS station finding out if the wind, rain, and temps are right for a summer fire is doing a very fine job. It’s the folks who try to say that NOT collecting a temp when the station is snowed in and fire danger is NIL who do not “get it”.

    The bottom line is that the temperature data series were “designed” for a dozen distinct purposes, not one of which was related to the AGW religion and not one of which is well suited to being tortured into a “global average temperature” (whatever that might be, or mean.)

  4. pyromancer76 says:

    My complaints are not about any Forest Service workers faithfully collecting temperature data while they are overworked in many additional areas — for a small salary. I have personal experience knowing these dedicated, hard workers. I also am not complaining about thermometers at airports or other places where temperature data is vital. I am complaining — in fact deeply angry — about thermometers used for national weather/climate data that are not sited according to regulations. Those regulations exist and have been in place for many years. Those supervisors/ management heads who were supposed to see that the regulations were followed for climate purposes are the ones I want fired. Same with scientists who denigrate the scientific method — fudge their data, hide their methods, refuse discussion.

    E.M. Smith, I wonder if you can identify not only those thermometers that have been sited for many years but those that are sited correctly. This might take communication with Anthony Watts. These thermometers, it seems to me, can give some raw data, for specific regions, of warming or cooling. Also, please keep after the fiction of GAW. It needs a shellacking.

  5. Jeff Alberts says:

    The bottom line is that the temperature data series were “designed” for a dozen distinct purposes, not one of which was related to the AGW religion and not one of which is well suited to being tortured into a “global average temperature” (whatever that might be, or mean.)

    That last bit about “”global average (or mean) temperature” has always bothered me. Along with the resulting “anomaly”.

  6. E.M.Smith says:

    @pyromancer76:

    Yes, I can select for any given set of records. The way I structured this is that you hand a list of station IDs to the select records program and it filters for that set. So as soon as anyone produces a list of station IDs for vetted good station sites, I can produce the data set. The surfacestations.org list of stations by quality just needs to be sorted by the quality flag, toss out the bottom end and run the top IDs through the selection. Pretty easy, really.

    BTW, since I’ve posted the source code, anyone else can do this too. At this point, anyone with a Linux / Unix box is empowered…

    I was just doing a quick “what do the stable stations do to the data” check; not picking an optimal strategy. Also on my “to do” list is to make station groups by quartile, and maybe by geography, and see what the data do in response. It’s the fun “what if” part of characterizing a program behaviours.

    @Jeff: Yeah, I keep reminding in the postings that I don’t think an average of thermometer records means much. Once you get to an anomaly in an irrationality of a fiction, well, I’m pretty much no longer engaged…

    But an irrational number set can still make a decent benchmark and can still be used for “signals intelligence” purposes. (Take a 1 MB file and compress it. There is a characteristic compression ratio for each general data type. Take a random compressed file and uncompress it. Compare it’s expansion ratio to the “benchmark” for that type. If the expansion ratio is not “right” you may have embedded encrypted hidden data. What is the meaning of an “expansion anomaly” as a number? Not much by itself, but it can point to where there may be something “interesting” to find…)

  7. Anne Observer says:

    As a software engineer, it is absolutely astounding to me that today, at the end of 2009, this data is still being represented in flat text files, rather than in a database. Of course, I suppose if one knew how to parse the data out of the file it could be put into a database easily enough.

    But mainly what I see is people writing relatively “primitive” programs for retrieving and manipulating the data as text, rather than using an approach that is designed around a central database, which is much more flexible. Maybe I am wrong, but all I have seen or heard mention of is data and data files, never databases.

    I have the “leaked” files… is HadCRUT3 represented in there anywhere as a file or a few files? If the format is well known I would be happy to parse it and stuff it into, say, a MySQL database that could then be queried on the fly and arbitrarily.

    And of course then export the data in a standard format like comma-delimited (what is now known as CSV ever since Microsoft got its paws on it). Or any other common format. I won’t promise to do anything custom.

    REPLY: [ The architecture seems to have been frozen about 1979 or so… Everyone with DBA experience who looks at this as the same reaction. I did. The GHCN data format is in an article linked from the GIStemp tab up top. GHCN is the input to GIStemp (that I look at on this site) while the leaked emails have said the “lost” HadCRUt data are 95% the same as GHCN. It’s a fairly trivial database load, but I’d left it “as is” since I wanted to benchmark GIStemp “as is”. But it is fun to play with and you could easily duplicate some of the work I’ve done here that way, but much more quickly. The HadCRUT3 product of CRU is not on this site (and I’m not sure where you would get it) but given their admissions, I’m doubt that it has much value… -E.M.Smith ]

  8. Pingback: Climate Fast Food | Digging in the Clay

Comments are closed.