USHCN.v2 GIStemp GHCN – What will it take to fix it?

How Do We Calibrate This Global Calorimeter

How Do We Calibrate This Global Calorimeter

Orginal image.

How To Calibrate a Calorimeter With Constantly Changing Thermometers?

Earlier we saw that GIStemp was broken in 2007 because GHCN (Global Historic Climate Network) dropped all but 136 stations in the USA, and at the same time NOAA changed from putting temperatures in the USHCN (U.S. Historic Climate Network) file to using a ‘version 2’ of USHCN (that I will denote by USHCN.v2).

At that point, GIStemp stopped getting U.S.A. temperature data updates from NOAA.

Dead Halt.

The USHCN file that GIStemp (one of the two major temperature histories in use for “Climate Research”) uses for US temperatures “cuts off” in early 2007. The USHCN.v2 file is not used.

This would be “No Problem”, the US data are also in GHCN, were it not for GHCN also dropping almost all US stations at the same time (and I might speculate for the same reasons…)

But GHCN keeps 136 stations in for the USA. I have not yet found out why these stations are in, or why no othesr are added, but one result is that California has 4 thermometers. One in San Francisco, the other three in Southern California near the beach. There is no way you can get a valid picture of California from those locations. It is certainly impossible to compare it with the past record that had thermometers in the snowy mountains. So we can have no idea if California is warming or cooling by looking at the USHCN data set or the GHCN data set.

So why not just use USHCN.v2?

I looked in to that, and what I found is a “world of hurt”.

When doing various kinds of work on data (in databases or in fixed files, or even just on paper with a pen) you need certain fields that uniquely identify certain things. Your name, or your drivers license number, or your Social Security Number. You get the idea. Well, in data base and computer work, these are called “keys” or “key fields” (or sometimes “sort fields” or “index fields”).

USHCN.v2 looks to use the same field as USHCN for the key field, and in STEP0 there is a hard coded table of equivalences from USHCN to GHCN format StationIDs. But this table must be maintained by hand.

To the extent that GIStemp have just “given up” on USHCN.v2, there will have been station changes that are not reflected in this “lookup table”:

STEP0/input_files/ushcn.tbl

And guess who will get to do that maintenance…

The Files

You can take a look at the USHCN.v2 “stuff” at:

http://www1.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/

The document that describes the files is:

http://www1.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/readme.txt

Here is a bit of the ushcn-v2-stations.txt file that is the moral equivalent of the v2.inv file from GHCN. That is, the “station inventory” file. It is the StationNumber, the Latitude, Longitude, the elevation in meters, the State as 2 letters, the StationName, and then a series of numbers or dashes (more on them later). Those first 6 numbers are the StationID. That is the “key” to match a station to the temperature data for that station. So we see that “FAIRHOPE” is “012813”.

[chiefio@tubularbells input_files]$ more ushcn-v2-stations.txt 
011084  31.0581  -87.0547   25.9 AL BREWTON 3 SSE                  ------ ------ ------ +6
012813  30.5467  -87.8808    7.0 AL FAIRHOPE 2 NE                  ------ ------ ------ +6
013160  32.8347  -88.1342   38.1 AL GAINESVILLE LOCK               011694 ------ ------ +6
013511  32.7017  -87.5808   67.1 AL GREENSBORO                     ------ ------ ------ +6
013816  31.8700  -86.2542  132.0 AL HIGHLAND HOME                  ------ ------ ------ +6
015749  34.7442  -87.5997  164.6 AL MUSCLE SHOALS AP               ------ ------ ------ +6

If we search the data file for that station ID, we find records like:

[chiefio@tubularbells tmp]$ grep 012813 9641C_200907_F52.avg 
01281331895   507E   443E   608E   666E   723E   783E   806E   813E   819E   659E   575E   505E   659E
01281331896   500E   539E   593E   708E   788E   794E   813E   827E   790E   691E   633E   522E   683E
01281331897   496E   575E   674E   666E   721E   817E   827E   811E   776E   715E   601E   542E   685E

Fair enough, it starts with “012813” and the last 4 digits of the first field are “1895” that is the year of the record. (The “3” in between them says this record is an “average of highs and lows” Other files with MIN or MAX would have a 1 or 2 there.) Then we get 13 repeating data items. There are the temperature in that month, for 12 months, and the annual average, in 1/10 F, and a flag to tell you if the data are:

E Estimated from no data at all
I Incomplete but they used what they had
Q Estimated from somewhere nearby because their QC algorithms didn’t like it
X Estimated from surrounding values because the month data were too short for their homgenization algorithms to be happy.

So we can see right off that every single one of the data items for these three years is an Estimate. Simply made up. We also learn from the fact that there is an X flag that all these data have already been “homogenized” in some way. One is left to wonder where the real data are?…

To their credit, the “degrees F” are now restricted to the 1/10 ths place having lost the very silly 1/100 ths place of the older format. While I’d still assert that given input data in whole degrees F, the 1/10 ths place is False Precision, it actually does make sense now that some of their equipment is reporting in 1/10 F precision (and hopefully accuracy as well).

FWIW, the later end of the record for this site looks like this:

01281332006   573    538    624    704    742    810    816    833    768    675    577    549    684 
01281332007   523    515    628    647    739    801    808    845    789    706    593    583    681 
01281332008   495    554    592    662    740    808    817    800    769    666    572    566    670 
01281332009   521    540    626    649    752    827    810  -9999  -9999  -9999  -9999  -9999  -9999 
[chiefio@tubularbells tmp]$ 

So lately we have real data (homgenized, but real, I think…) with missing data flagged with a -9999 (where you can see that I downloaded this file just before the August data came out).

So What’s The Problem?

The data in GIStemp has an inventory file too, it is v2.inv and the entry for FAIRHOPE looks like this:

[chiefio@tubularbells tmp]$ inin FAIRHOPE
42572223003 FAIRHOPE 2NE                    30.55  -87.88    7   26S   12FLxxCO 3x-9WARM CONIFER    C2  27
[chiefio@tubularbells tmp]$ 

The first 3 characters say “USA” (425). Then we get the 5 digit station ID of “72223” and the substation identifier of “003”, so this station can be uniquely identified in the USA as StationID 72223003 but in USHCN.v2 it is 012813 and those two don’t match.

No Problem, the “get_USHCN2v2.f” program does the conversion. But based on that (now not maintained) table…

So after you do that update and match, you can then move on to the next issue.

You get to figure out what to do about the fact that the URS Urban Rural Suburban flag is gone… And the Brightness Index are not in USHCN.v2. BOTH are essential for the functioning of GIStemp. They are core to how it does it’s UHI adjustment. So if a new station comes in, those flags must be created for it. Hope there are not too many new stations…

UPDATE: Converted USHCN.v2 to USHCN, Crashed on Station Info

Well, I got this idea to just read in the USHCN.v2 format file and write out an old USHCN format file (to feed straight into GIStemp) and see what happened. It worked up to a point. At the point where it needs to match on the v2.inv file (as noted above) it crashed. So there issome station information that needs a hand entry of data… Looks like 59 of them..

[chiefio@tubularbells tmp]$ grep "^>" USHCN.delta | wc -l
     59

Not going to happen tonight…

For your amusement, here is what the middle of the run output looks like:

Bringing Antarctic tables closer to input_files/v2.mean format
collecting surface station data
... and autom. weather stn data
... and australian data
replacing '-' by -999.9, blanks are left alone at this stage
adding extra Antarctica station data to input_files/v2.mean
created v2.meanx from v2_antarct.dat and input_files/v2.mean
GHCN data:
 removing data before year 1880.
created v2.meany from v2.meanx
replacing USHCN station data in v2.mean by USHCN_noFIL data (Tobs+maxmin adj+SHAPadj+noFIL)
  reformat USHCN to v2.mean format
extracting FILIN data
getting inventory data for v2-IDs
 id-file ended !
finding offset caused by adjustments
extracting US data from GHCN set
 removing data before year 1980.
getting USHCN data:
-rw-rw-r--    1 chiefio  chiefio    159313 Nov  7 01:48 USHCN.v2.mean_noFIL
-rw-rw-r--    1 chiefio  chiefio    145838 Nov  7 01:48 xxx
doing dump_old.exe
 removing data before year 1880.
-rw-rw-r--    1 chiefio  chiefio    145838 Nov  7 01:48 yyy
Sorting into USHCN.v2.mean_noFIL
-rw-rw-r--    1 chiefio  chiefio    145838 Nov  7 01:48 USHCN.v2.mean_noFIL
At line 10 of file ./dif.ushcn.ghcn.f
Traceback: not available, compile with -ftrace=frame or -ftrace=full
Fortran runtime error: End of file
created ushcn-ghcn_offset_noFIL 
Doing cmb2.ushcn.v2.exe
At line 14 of file ./cmb2.ushcn.v2.f (Unit 3 "ushcn-ghcn_offset_noFIL")
Traceback: not available, compile with -ftrace=frame or -ftrace=full
Fortran runtime error: End of file
created  v2.meanz
replacing Hohenspeissenberg data in v2.mean by more complete data (priv.comm.)
disregard pre-1880 data:
At Cleanup
created v2.mean_comb

Can you guess what error message tells you where the problem is?

Nope, it’s not the “Fortran runtime error”, though that helps a little. It is that tiny little almost unreadable line ” id-file ended !”. That’s the entire error message. The could could check for a failure exit code and halt, but it doesn’t. It just keeps on trying to run bits until something else crashes as collateral damage.

So it looks like I get to go off searching for new station data.

But in the mean time, it looks like my data conversion worked. The file ran fine until there was not Station Information record.

And What About Those Dashes?

I’ve “wrapped” the lines so you can see the part with dashes better:

[chiefio@tubularbells input_files]$ more ushcn-v2-stations.txt 
011084  31.0581  -87.0547   25.9 AL BREWTON 3 SSE                  
------ ------ ------ +6
012813  30.5467  -87.8808    7.0 AL FAIRHOPE 2 NE                  
------ ------ ------ +6
013160  32.8347  -88.1342   38.1 AL GAINESVILLE LOCK               
011694 ------ ------ +6
013511  32.7017  -87.5808   67.1 AL GREENSBORO                     
------ ------ ------ +6
013816  31.8700  -86.2542  132.0 AL HIGHLAND HOME                  
------ ------ ------ +6
015749  34.7442  -87.5997  164.6 AL MUSCLE SHOALS AP               
------ ------ ------ +6

The “+6” is the local time offset from UTC. But notice that GAINSVILLE LOCK entry? That number looks mighty like a Station ID… And it is. It is the station used for making up missing data. From the description file:

COMPONENT 1 
is the Coop Id for the first station (in chronologic order) whose 
records were joined with those of the HCN site to form a longer time
series.  "------" indicates "not applicable".
            
COMPONENT 2 
is the Coop Id for the second station (if applicable) whose records 
were joined with those of the HCN site to form a longer time series.
	    
COMPONENT 3 
is the Coop Id for the third station (if applicable) whose records 
were joined with those of the HCN site to form a longer time series.

So NOAA have joined the “stretch, interpolate, In-fill, homogenize” data fabrication brigade. So for any record you may have up to 4 thermometers that it actually represents.

Oh Great.

By Wait, There’s More!

Just for grins, I wondered how many thermometers survived into USHCN.v2

[chiefio@tubularbells input_files]$ wc -l ushcn-v2-stations.txt 
   1218 ushcn-v2-stations.txt
[chiefio@tubularbells input_files]$ grep ^425 v2.inv | wc -l
   1921
[chiefio@tubularbells input_files]$

Somewhere along the line we lost a net of 703 thermometers. If there were any additions, the number of “lost” will be higher than that.

So even if we go through the exercise of key creation, matching, USR and Brightness additions, and data format conversion; we’re still dealing with a drop of about 1/3 of the stations.

Geek Corner

This is the source code for a program to replace USHCN2v2.f that will read and process the USHCN.v2 files directly. In comments you will find a program that converts a USHCN.v2 file into an older USHCN format file. So here we have two different ways to achieve the same goal. Both produce that same results. (I’ve ‘diffed’ the output files and they match.)

Notice that this program looks for a new copy of the latest USHCN.v2 format data in a file named v2_USHCN in the STEP0 directory. You either need to unpack the file and put it there, or put a link from that name to where ever you unpacked it (probably input_files) or change the wrapper script.

[chiefio@tubularbells analysis]$ cat v2USHCN2v2.f 
C2345*7891         2         3         4         5         6         712sssssss8
C     Program:  v2USHCN2v2.f
C     Written:  November 11, 2009
C     Author:   E. M. Smith
C     Function: To convert the USHCN.v2 file as USHCN2v2.f did USHCN
C
C     Copyright (c) 2009
C
C     This program is free software; you can redistribute it and/or
C     modify it under the terms of the GNU General Public License as
C     published by the Free Software Foundation; either version 2,
C     or (at your option) any later version.
C
C     There is one exception to this free license:
C     NASA, their contractors, and any agency providing
C     work for them that would use this software or a
C     similar derivative work as part of GIStemp or related
C     projects.  For anyone in that group or category, a
C     license is granted subject to the requirement that a
C     fee equal to 1/10th the total annual compensation
C     paid to James Hansen be paid to the author, E.M.Smith.
C     If the amount paid to said Mr. Hansen should become
C     zero, then ....   BTW, said compensation includes any
C     retirement benefits or speaking fees and similar fees
C     paid eiither directly or indirectly via contracting agencies
C     or any other third party.
C
C     Basically, you use this to support the official copy of
C     GIStemp, it's worth at least 1/10 th of Hansen's cut.
C     Pay up, or boot him.
C
C     This program is distributed in the hope that it will be useful,
C     but WITHOUT ANY WARRANTY; without even the implied warranty of
C     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
C     GNU General Public License for more details.
C
C     You will notice this "ruler" periodically in the code.
C     FORTRAN is position sensitive, so I use this to help me keep
C     track of where the first 5 "lable numbers" can go, the 6th
C     position "continuation card" character, the code positions up
C     to number 72 on your "punched card" and the "card serial number"
C     positions that let you sort your punched cards back into a proper
C     running program if you dropped the deck.  (And believe it or
C     not, I used that "feature" more than once in "The Days Before
C     Time And The Internet Began"...

C2345*7891         2         3         4         5         6         712sssssss8

C     Declare your variables; these added -ems
C
      character*134 line
      character*11 idg
      real*4 temp
      real*4 ftoc 
      integer itemp(12)
      integer   mtemps(13)
      character flags(13)

C     This is the 5/9 of an F to C conversion as a constant to improve
C     efficiency and remove precision jitter from bit shifts.

      ftoc=5./9.

C     Open your files.  "ID_US_G" is a sorted version of ushcn.tbl
C     "v2_USHCN" is the most current USHCH.v2 data input file
C     If a record is in v2_USHCN but not in ushcn.tbl, we skip it 
C     but print an update nag into file "ushcn.tbl.updates"
C     The others are as in USHCH2v2.f in the original GIStemp.
C     Where "USHCN.v2.mean_noFIL" is the output of merged GHCN and USHCN data
C     while "USHCN.last_year" holds the 4 character value for the last year
C     with datq in the USHCN input file (or in this case, the USHCN.v2 file.)

      open(1,form='formatted',file='ID_US_G')
      open(3,form='formatted',file='v2_USHCN')
      open(4,form='formatted',file='ushcn.tbl.updates')
      open(10,form='formatted',file='USHCN.v2.mean_noFIL')
      open(20,form='formatted',file='USHCN.last_year')

C     Throughout the code, you will find "diagnostic writes" like this one
C     that are commented out.  They are used during testing and validation
C     to assure what you thought is going on, is going on.

C      write(*,*) "After Open of Files"

C     Initialize variables and establish initial state for data read from files.

      iyrmax=0
      rewind 1
      rewind 3

      read(1,'(5x,i6,1x,a11,1x,i1)') idus,idg,iz

C2345*7891         2         3         4         5         6         712sssssss8

C     Start the basic processing loop.  Read records from USHCN.v2 and match
C     them to the ushcn.tbl map for the GHCN style station number.

C     Main Loop - exit when out of USHCN.v2 records to line 100
   10 Continue

      read (3,99,end=100) idus2,iyr,(mtemps(n),flags(n), n=1,13)
  99  FORMAT (i6,1x,i4,13(i6,a1))

C      write(*,*) "inside loop", idus , idus2

      do while (idus .ne. idus2)
         read(1,'(5x,i6,1x,a11,2x,i1)',end=20) idus,idg,iz
C         write(*,*) "inside loop 2", idus , idus2
C         This loop finds the ushcn.tbl entry, if not present it goes to 
C         the "skip handling" code.
      end do

C     For each month of the record, pack the missing data value in, if 
C     appropriate, or for "E" estimated fields, give them a missing 
C     value setting as well.
C     If you have real valid data, convert it from F in 1/10 degree to
C     C in 1/10 degree.

      do lc=1,12

          if      (mtemps(lc).le.-9000) then
                   itemp(lc) = -9999

          else if (flags(lc) .eq. "E")  then
                   itemp(lc) = -9999
          else 
                   itemp(lc)=nint(ftoc*(mtemps(lc)-320) ) ! F->.1C
          end if
      end do

C      write(*,*) "Done with2: ", idg, idus, idus2, iz,iyr,itemp

C     Write out the work product.  ID, the always zero iz, year and temps
      write(10,'(a11,i1,i4,12i5)') idg,iz,iyr,itemp

C     Increment the max year reached counter, if needed
      if(iyr.gt.iyrmax) iyrmax=iyr

C     So go get another record until you run out, then go to 100 for exit.
      goto 10

C2345*7891         2         3         4         5         6         712sssssss8
C     Rather than just die if a new thermomter shows up, we log it to an
C     updates needed file and continue to produce usable, if not perfect,
C     temperature files. 
  20  Continue
      write(4,*) "ushcn.tbl Update needed, Station/year : ",idus2,iyr
      rewind 1
      goto 10
C     This is an example of "resiliency programming".  Where the orginal
C     would just die on the first station that needed a ushcn.tbl update
C     and you got to re-run as many times as there were missing stations,
C     This way you get a nice report and can continue with the rest of the
C     run as a test 'end to end' too.

C     This is the "normal Exit" handling.  Here we have the logging of 
C     the highest year reached into the "20" file and notification of 
C     the operator on the console log as well.
  100 continue
      write(20,*) iyrmax
      write(*,*) 'USHCN data end in ',iyrmax
      stop
      end
[chiefio@tubularbells analysis]$ 

Conclusion? Yes, I think GIStemp is Reaching a Cliff of Conclusion…

I think we now know why GIStemp has not done the “maintenance programming” to merge the new USHCN.v2 file format in to GIStemp. They would rather just let it suffer the “bit rot” and let the thermometer count dwindle. Even if it were added, they would still have thermometer loss, so it is more a decision of “degree” than of “kind”; and if you are going to be hosed anyway, why not do it on the cheap?

So GIStemp has taken a flying leap off “The Cliff of Conclusion” and decided that thermometer count and location don’t really matter after all. 4 On The Beach in California is a good as one in the Mojave or 4 at Mt. Shasta, Yosemite, Weed, and Tahoe…

The alternative would be to admit that it matters, and that they just took a big enough hit to the thermometer count and locations that the whole GIStemp product is a useless hulk that can’t get spare parts and uses tires of a size they don’t make any more…

Stick A Fork In Him, Pedro, He’s Cooked.

We are trying to do a calorimetry experiment / measurement on the planet, and the thermometers keep getting changed by the undergrad students, someone keeps leaving the heater on in the room, the Janitor likes to open a window when he works, and nobody has calibrated the thermometers anyway. Oh, and the fluid flow rates and temperatures keep changing too..

This makes Cold Fusion Calorimetry look like stellar work in comparison…

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific, GISStemp Technical and Source Code and tagged , . Bookmark the permalink.

17 Responses to USHCN.v2 GIStemp GHCN – What will it take to fix it?

  1. Bob D says:

    Hi EM,

    Do you have an email address I can use to send you something? I have a technical question or two I’d like to ask regarding the GHCN data. I need to ensure I’m not reading things wrong, because I plotted something and I can’t believe what I’m seeing.

    Alternatively, you could just send me an email and I’ll be able to reply to it, if you don’t mind that is.

    Cheers,
    Bob

  2. Dennis says:

    E.M.

    Just a shot in the dark (you left me, technically [or maybe technologically] a couple of weeks ago), but is it remotely possible that they ran some kind of regression to drop out thermometers. That is, at some large number of thermometers, to answer a very large but very general question, only these truly representative thermometers matter. Looking at the ones that are left where I live that doesn’t seem very likely, but I have this niggling feeling about “…not attributing to malice what can be adequately explained by stupidity.”

  3. E.M.Smith says:

    @Bob D

    Email address is in the ABOUT tab up top in a form the SPAM daemons can not parse. It is: pub4all then the at sign, followed just a bit later with Dot aol and of course, the usual Dot com.

    @Dennis

    I’m trying to “span the gap” and present enough “technical meat” for the “back room boys” to get all the detail and know I’ve done my homework right, while still trying to present some “top level” descriptions that will speak to most folks.

    You ought to be able to graze over the “tech talk”, taking comfort from knowing others ARE looking at it to assure there are no holes, and skip to the “conclusions”. If any bit is too dense, I’m happy to provide more “normal folks translations”. Feel free to point out “dense bits”.

    At least, that’s the hope. We’ll see…

    As per Hanlon’s Razor:

    It is just a test for assigning the “blame” between Malice and Stupidity; not for deciding between “good idea” and “malicious / stupid idea”.

    Your question amounts to: Was this a good idea?

    Only after that does Hanlon’s Razor come in to decide if they ought to be tried or simply sent to the Shady Pines Dodgy Manor and Rest Home…

    So, was it an intelligent act?

    BTW, I suspect that something similar to what you propose did happen, but from the ‘less smart’ side… we’ll come back to that…

    I think we can approach the “smartness” of the act from two ways. Theory and practice.

    In practice, we have an existence proof that it was pretty dumb. California. We got a news bleat about “115 Year Record Heat!” when LOTS of folks in California have been able to say it was not a record in any way and was, in fact, one of the cooler years in a long time. You just can not “measure California” from Los Angeles (at the beach or at the Airport).

    Pretty dumb.

    From the theory point of view, you would want a ‘representative sample’. Even if we ignore Nyquist and decided to just sample each major topology: There are no thermometers used from any of: Sierra Nevada Mountains, Mojave Desert, Cascade / Northern mountains, Redwood Coastal Fog Belt, Central Valley (summer oven…).

    You can not have a representative sample of topologies when all your thermometers are near / on the beach.

    So I think it fails the “smart test”…

    OK, what to I guess happened?

    They sucked their own exhaust.

    Folks in a tight group develop things they believe as articles of faith, but since the group is tight, no one questions the Articles Of Faith.

    One of those AOF for the Global Warming Thermometer Kings is that you can average together some semi-random set of thermometers and:

    a) It means something.
    b) You can use a highly variable set.
    c) A small sample is just fine.

    Now I assert that “a” is simply wrong. There is an entire discussion of that under the GIStemp tab comments up top. But for most purposes I just accept that they will keep on believing that broken belief.

    I also assert that “b” is wrong. A lot of the work I’m doing to benchmark GIStemp is to prove that. (Watch here, there will be a new posting of interest Real Soon Now…)

    And finally “c” where we’ve had one notable suggest that as low a count as 60 thermometers would be just fine.

    So I suspect that The Climate Kings and The Thermometer Kings got together and were all repeating the Articles of Faith in the daily prayers when someone asked what to do about the Real Soon Now bucket of money for a Shiny Thing new thermometer system. They decided to build this nice new Climate Research System for new reporting. Who needs all that old stuff anyway.

    And maybe someone said “But consistency of the record?”

    At which point AOF “b” says “Who cares?”.

    And maybe someone said “Nyquist? Representative areas?”

    At which point AOF “c” says “Who cares?”.

    And that, IMHO, is the moment when the bull ran through the thermometer shop stomping on all those old glass bulbs of mercury… Mindless, full of it’s own exhaust, and in complete Consensus with the Articles Of Faith.

    But wrong.

    And that, to me, makes it very un-inteligent.

    And unfortunately, I suspect it also then lets Hanlon’s Razor say these folks ought not be taken out back and beaten with a stick, but instead ought to be sent off to re-education camp or given a job polishing the brass in the men’s room…

    The truth, as I see it, is more in keeping with a calorimeter. You need to calibrate your instrument and have as few changes as possible over the life of the observation interval.

    The more you instrument, the better, up to a point; and you don’t want to be mucking about with the thermometers or instrumentation during the run… it screws up the calibration.

    But this “lab sense” or “lab truth” is lost on the folks who didn’t like practical lab much and like playing with the computer more. As a professional computer guy, I’ve worked with lots of them. (I supported a major R&D organization for a lot of years doing computer simulations and other advance work). You get to know who is the guy who is careful and never believes the box until it’s been checked 3 times and who is the guy who believes his code is perfect, even after it crashed on him a dozen times and told lies 20. He’s the one “sucking his own exhaust”. And sometimes whole groups do it.

    It’s fun to watch, as long as they don’t get budget or authority over you… Yet often these same folks are the most passionate and sway management the most effectively; but are still wrong. As a manager, it pains me to admit that the “maybe wrong but passionate presentation” wins funding over the “modest claims, 100% proven, detailed presentation”.

    And that is what I see here.

    Nobody in an R&D shop wants to do what I am doing. Test the code, end to end, with incredibly dull boring code reviews and benchmarks. Explore the performance envelope and write up where it fails. It’s so… so… “Negative.” “Boring.” “Not with the program.” “QUESTIONING the big guys CODE and challenging his REPUTATION!”. Heck, it’s a standard of management social circles that everyone is supposed to “Think Positively” and “Avoid negative people”. You want to talk about failure, you are a social pariah.

    You don’t get promoted up the ranks by needling folks and telling them you think maybe they are being a bit stupid.

    So we get a lot of the “Go along to get along” (as seen in Hansen’s boss who flat out said he knew he had a problem but chose to ignore it) and the “You support my grant request I’ll support yours”.

    And the end game is that after a decade or two, they start to believe their own fantasies and trust their own unfounded assumptions. They become Articles Of Faith.

    And the result is bad decisions, like deletion of 90%+ of the thermometer readings in the present is OK, you can still compare it to the readings from the past when we were measuring snow, not beach. (Yes, it is exactly that which is being done.) Because “Everyone knows it’s OK”…

  4. E.M.Smith says:

    This code translates a USHCN.v2 files into a USHCN old format.

    I have made a converted USHCN.v2 and run it through STEP0 of GIStemp “with interesting results”… To be posted “real soon now”.

    I “make assumptions” about the different “estimates” flags and just blindly stuff a couple of fields that, as near as I can tell, are not used in GIStemp (like I say all records came from a monthly tape drive rather than assigning them to books and sources…).

    I think I get a few more records flagged as “Made Up” from nothing than ought to be (either that, or the USHCN.v2 file flags more that way) so the early thermometer counts seems to drop in the temperature reports, but to sort that out will take more testing and QA time. For now, here’s the code.

    [chiefio@tubularbells tmp]$ cat v2Utold.f 
    C2345*7891         2         3         4         5         6         712sssssss8
    C     Program:  v2Utold.f
    C     Written:  November 6, 2009
    C     Author:   E. M. Smith
    C     Function: To convert the USHCN.v2 file into a USHCN old format
    
    C     Copyright (c) 2009
    C
    C     This program is free software; you can redistribute it and/or
    C     modify it under the terms of the GNU General Public License as
    C     published by the Free Software Foundation; either version 2,
    C     or (at your option) any later version.
    C
    C     There is one exception to this free license:  
    C     NASA, their contractors, and any agency providing
    C     work for them that would use this software or a 
    C     similar derivative work as part of GIStemp or related
    C     projects.  For anyone in that group or category, a
    C     license is granted subject to the requirement that a 
    C     fee equal to 1/10th the total annual compensation
    C     paid to James Hansen be paid to the author, E.M.Smith.
    C     If the amount paid to said Mr. Hansen should become
    C     zero, then ....   BTW, said compensation includes any
    C     retirement benefits or speaking fees and similar fees
    C     paid eiither directly or indirectly via contracting agencies
    C     or any other third party.
    C     
    C     Basically, you use this to support the official copy of
    C     GIStemp, it's worth at least 1/10 th of Hansen's cut.
    C     Pay up, or boot him.
    C
    C     This program is distributed in the hope that it will be useful,
    C     but WITHOUT ANY WARRANTY; without even the implied warranty of
    C     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    C     GNU General Public License for more details.
    C
    C     You will notice this "ruler" periodically in the code.
    C     FORTRAN is position sensitive, so I use this to help me keep
    C     track of where the first 5 "lable numbers" can go, the 6th
    C     position "continuation card" character, the code positions up
    C     to number 72 on your "punched card" and the "card serial number"
    C     positions that let you sort your punched cards back into a proper
    C     running program if you dropped the deck.  (And believe it or
    C     not, I used that "feature" more than once in "The Days Before
    C     Time And The Internet Began"...
    
    C2345*7891         2         3         4         5         6         712sssssss8
    C      
    C     Declare your variables
    C
          integer   mtemps(13) 
          character flags(13)
          real     ftemps(13)
          character*6 adus
    
    C
    C     Define and open files
    C
    
    C     For now, use v2_USHCN as the input file (parm it later)
    C     Same thing for hcn_from_v2_USHCN
    
          open(1,form='formatted',file='v2_USHCN')
          open(2,form='formatted',file='hcn_from_v2_USHCN')
    
    C     Read in a new format record, convert it to an old format
    C     record, and write it out.  Assume all new format records
    C     are of the "3A" type (i.e. good records).
    
    C2345*7891         2         3         4         5         6         712sssssss8
    C
    C     Read in a record, convert int to float
    C
     66   Continue
    
          read (1,88,end=666) adus,iyr,(mtemps(n),flags(n), n=1,13)
    
      88  FORMAT (a6,1x,i4,13(i6,a1))
    
          do lc=1,13
              if(mtemps(lc).le.-9000) then
                     ftemps(lc)=-99.99
              else
                     ftemps(lc)=mtemps(lc)/10.
              end if
              if(flags(lc) .eq. 'E') flags(lc)="M"
              if(flags(lc) .eq. 'I') flags(lc)="E"
              if(flags(lc) .eq. 'Q') flags(lc)="E"
              if(flags(lc) .eq. 'X') flags(lc)="E"
          end do
     
    C     And write out a record.  Need to do better flag sort by column?
    C
    C     4 flag positions.  If interpolated, put I in pos 1 - leave blank
    C                        If estimated, put  "." in pos 1 - leave blank
    C                        Pack a 1 in pos 2 to say tape, monthly
    C                        Pack a Big O in pos 3 to say "corrected" TOBS
    C                        Pos 4 - E becomes M, IQX Become E.
    
          write(2,99) adus,iyr,( ftemps(n),flags(n), n=1,13)
      99  FORMAT (a6,1x,i4," 3A",13(f6.2," 1O",a1))
    
          goto 66
    C     And keep going there until you've done them all, every last record.
    
    666   continue
    C     This exit lets you return to the regular GIStemp code...
          stop "Normal Exit"
          end
    

    I need to “tune up” the mapping of old estimate type flags to new estimate type flags, but for now, it works well enough.

  5. vjones says:

    It was worth reading that code just for your exception ;-)

    REPLY: “Glad I could please ;-) Sometimes programmers like to leave ‘Easter Eggs’ for each other 8-) and frankly, I think they need to do what this code does and I’d like them to think about exactly who is getting paid how much for doing what that is of value… or in the case of folks attending demonstrations, not of value… -ems”

  6. Rob R says:

    “Folks in a tight group develop things they believe as articles of faith, but since the group is tight, no one questions the Articles Of Faith.”

    Hasnt Gavin Schmidt of GIS been reported as saying words to the effect that he can create an index to track global mean temperature from about 60 well chosen sites? Thats for the whole world, or was it for the USA only? Either way 4 sites in California is clearly an over-representation of that fine state.

    Personally I like your cut of the 103+ year long records and the 50-100 year long records. If these could have the UHI and land-use-change issues cleaned out then we might be getting much closer to a suitable global index. Oh, but then we also need to be sure the index is stable in a latitudinal sense and in an elevation sense. Oh, then we should consider whether we should be infilling missing data from stations that are many km away from the key sites in the index. This is all assuming that “time of day” issues relating to the actual observations have been tidied up correctly. It also assumes that issues with changes in the type of temp measuring device have been tidied up correctly. No doubt there are other considerations as well.

    Well yes, we also need to be sure smallish shifts in the location of climate stations has had no impact. Its enough to make ones head spin.

  7. E.M.Smith says:

    @Rob R:

    That pretty much sums it up…

    BTW, my “take” on how the “60 is enough” has mutated into this “FUBAR” is pretty simple:

    60 IS enough to measure the planet as long as it is a stable 60 and does not change over time. The problem is that all of our history is one of change… So if you start now with a new stable 60 in about 200 years you will have a decent set of data to use…

    But folks don’t want that, needing a career today, so they compare “the 60” (or some other semi-random) set to a semi-random set from the past (with all the changes in them) and attempt to ‘undo time and change’ with computer code (that does not succeed): Then they believe the results…

    In short: “60” would work; if only we had started in 1709. Now, not so much…

  8. rob r says:

    Personally I would like a global temp index containing several hundred individual stations with a good global distibution. The index should clearly display realistic plus/minus error bounds. It should have a good number of stations that are more-or-less present through the entire time period included in the index or as close to this as possible. The error could increase/decrease through time according to the number, spread and quality of climate stations available.

    Such an index could also be split into regions somewhat along the lines of some of your recent postings. Each regional index containing long-lived sites, preferably not from major airports. Regional indices might require fewer stations than in the global set. The regional index would commence only when sufficient records become available and each index would have its own error bounds.

    Short lived records would not be included in the temperature indices.

    I suspect that in the modern era the indices should be benchmarked against UAH and or RSS lower tropospheric trends. Not that these would be included in the index but that they would be presented as a cross validation exercise (something that does not appear to happen at present).

    Benchmarking against long Hadcrut SST records is probably pointless as the uncertainties in SST measurement techniques are huge.

    SST might best be incorporated by including a good proportion of sites from islands and by including the likes of lighthouse records.

    The process for producing each temperature index would obviously be open for continuous external review.

    I suspect its too much for one individual private hobbiest to maintain.

    REPLY: [ Maintain? No problem. Create? Oops, um, er…

    Pick One: Good geographic representation or over 100 year duration.

    Basically, a lot of what I’ve been doing here is discovering how various broken assumptions and bad code design try to get around that fact. You can have time, back to one thermometer around 1700. Or you can have number (and by implication geographic coverage) starting in about 1900 to 1940 depending on how much of the planet is “enough”. Not both. GIStemp tries to have ‘half a loaf’ by starting time in 1880 and then fabricating values were it does not have enough. That “is a bad idea”…

    At the other end, we have lately started replacing the old instruments with new electronic ones and now you have to do a splice. We also move stations around. Splice. We also discontinue stations. Truncated record. Someone has also started dropping records wholesale from the dataset in about 1989. Truncated. So your ‘present through the entire time period’ just fell apart. But you got to add splicing concerns…

    With all that said: One of my “someday do” things is to create a version of the GIStemp software with the broken bits fixed. As I’m going along finding bogus things, I’m make a list of “fixes” (one you’ve just seen). When I reach the far end of the code, it ought to be a more workable product, even if it isn’t an ideal one.

    Unfortunately, it looks like I need to find a way to “jump past GHCN” to the originators of the thermometer records for each country to stop the GHCN Thermometer Langoliers from eating all the thermometers… I’ve done that for the USA with the USHCN.v2 mod. But that just leaves Australia, New Zealand, Europe, Russia, China, … -ems ]

  9. rob r says:

    Maybe you can get the missing data from Phil Jones at CRU (just joking).

    Some of it may be sitting behind a pay-wall in some countries.

    I suspect a large part of the “missing data” exists well past the point of convenient truncation.

    Direct appeals to the various National meteorological and climate institutes could bring some of the data in from the cold. This is where you might want to enlist a few helpers a little like Anthony Watts. Say something like globalstations.org

  10. Mike McMillan says:

    Re: USHCN v2

    Last summer I made up USHCN/GISS temp blink comparators for IL, IA, and WI, so I had complete downloads of the graphs for those states as of July 09. I noticed this week that Grand Canyon raw looked funny on the GISS site, so I began comparing the IL July with the IL Novermber USHCN raw charts. I downloaded in July, but the images have last modified dates in 2008.

    Of the 14 I’ve done, 12 eyeball to increased warming slope, one about the same, and one was adjusted towards cooling.

    Here’s the 14, click each for larger image

    http://www.rockyhigh66.org/stuff/USHCN_revisions.htm

  11. E.M.Smith says:

    And amazing demonstration of just how volatile the “raw data” are in the adjusting process.

    Watching that set of 14 charts blink and squirm gives a very dramatic visual of the total size of the “fudging” done up front to the input data.

    The really insane part of it is that there was, typically, a single instrument read by a single person on a single day with a single value reported. EVERYTHING Else is a fabrication based on the “processing” of the data…

    Would it be possible, now that GIStemp is going to be using USHCN.v2 data, to get a similar set of “blinkers” comparing the USHCN and the USHCN.v2 for some selected sites? It would be very interesting to see if the shift of input files is likely to change the graph slope. (We already know the individual data items bounce around a lot, just don’t know yet if it is random jitter or slope change). This becomes ‘a material issue’ starting now, as GIStemp is using USHCN.v2 starting now… Oh, and a “now vs then” GIStemp vs GIStemp would be interesting too… so see if the USHCN.v2 swap into GIStemp has changed things other than since 2007…

    I really do need to set aside the time to make graphs like those….

  12. Pingback: American Thinker on CRU, GISS, and Climategate « Watts Up With That?

  13. Pingback: Climategate: CRU Was But the Tip of the Iceberg « Thoughts Of A Conservative Christian

  14. Pingback: ClimateGate Part II « Calvin Freiburger Online

  15. Pingback: Climategate: CRU Was But the Tip of the Iceberg | OrthodoxNet.com Blog | Blog Archive

  16. Pingback: The Roundup: IPCC Authors Now Admitting Fault – No Warming Since 1995 – Sea Levels Not Rising « The IUSB Vision Weblog

  17. Pingback: GIStemp Reloaded | Digging in the Clay

Comments are closed.