Thermometer Zombie Walk

Zombie Walk

Zombie Walk, when the Undead Walk the Earth

Zombie Thermometers – Return of The Un-Dead

In looking for what thermometers died in 2010, I discovered that there are Zombie Thermometers. They appear to be alive in some years, but sometimes are unresponsive and give no data. They can be dropped from the GHCN v2.mean data file (silently) as though they have died. Gone and buried. Not even giving a ‘missing data flag’ to show they are alive.

And yet….

When time passes, these, the Un-Dead, can return to the surface of the planet and mingle their data with the living thermometers…

Examples

These are just random samples from the copies of GHCN/v2.mean that I’ve saved over the months. There are many many more (while I’ve not done any measuring of how many, yet, it is pretty easy to surface this kind of record once you know to start looking).

Here is an example of a 3 month Zombie Thermometer. It shows in the “Dead List” right now as 2010 is missing, but it may come back to life 3 months later as new data arives. Notice that in December of 2009, the July and August 2009 data showed up. Then in February 2010 we have September of 2009 show up. (So as of right now, it’s a 4 month Zombie…)

The data format here is that the first character shows what the record looked like on the first date (-) and what it looked like in the last date (+). Then the 12 digit StationID and 4 digit year, followed by 12 months of data with -9999 being the ‘missing data flag’.

Here we have the October copy showing no data for July-Dec. Yet in the December record, we have July and August return to life. In February we have Sept alive again, but no 2010 record and a “Zombie” that died in 2009. Will it return to life in May or June? We will have to wait to see. For now, the v2.mean file shows it has died in 2010. One would expect a ‘missing data flag’ for those records that are vetted to be in the data set but have not reported. But the reality is different. The ‘vetting’ seems to be ersatz or after the fact, at best.

– 1016040300022009 100 99 118 138 199 239-9999-9999-9999-9999-9999-9999
+ 1016040300022009 100 99 118 138 199 239 285 275-9999-9999-9999-9999

Snow-Book:~/Desktop/GHCN chiefio$ grep 1016040300022009 diff_Dec_12Feb
– 1016040300022009 100 99 118 138 199 239 285 275-9999-9999-9999-9999
+ 1016040300022009 100 99 118 138 199 239 285 275 225-9999-9999-9999

The “good news” out of this is that it implies that the file matching brittleness that prevented certain types of ‘delete records’ benchmarks to be run on GIStemp may apply only to the USA data (where it matches USHCN and GHCN data). As time permits, I’ll try deleting something in the Southern Hemisphere to verify that surmise.

This next one looks like a 1 month Zombie Thermometer. In February, we have December data, but not January. Though last December we had Novebmer data, so it isn’t always a Zombie, or maybe it’s a 1/2 month zombie?

Oct to December things are fine:

– 1016042500012009 110 114 147 145 230 279 314 298 242-9999-9999-9999
+ 1016042500012009 110 114 147 145 230 279 314 298 242 217 175-9999

Then December to February 12:

Snow-Book:~/Desktop/GHCN chiefio$ grep 1016042500012009 diff_Dec_12Feb
– 1016042500012009 110 114 147 145 230 279 314 298 242 217 175-9999
+ 1016042500012009 110 114 147 145 230 279 314 298 242 217 175 135

Other Regions

Those records are from Africa (the first stuff in the file) but just so folks know this isn’t an Africa or “3rd world” thing, here are some samples from the Pacific Region and Europe:

Dec to Feb 12:

Here is one where Sept re-appears, and January, but not Nov or Dec. Yet.
So this one looks like a ‘repair’ of September, with a continued “Dropout” of November and December”? Who knows… but at least 5 months after the fact, data can appear or change.

– 5359121200012009 274 277 272 281 285 285 281 279-9999 277-9999-9999
+ 5359121200012009 274 277 272 281 285 285 281 279 278 277-9999-9999
+ 5359121200012010 270-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

In this one, three months stay ‘dropped’ while Dec and Jan show:

– 5099864400032009 276 269 277 285 289 287 282 297-9999-9999-9999-9999
+ 5099864400032009 276 269 277 285 289 287 282 297-9999-9999-9999 272
+ 5099864400032010 266-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

Here we get Dec and Jan on one station in the country (what you would expect as normal)

– 6382264100002009 -105 -93 -43 -14 85 135 163 149 119 14 -15-9999
+ 6382264100002009 -105 -93 -43 -14 85 135 163 149 119 14 -15 -100
+ 6382264100002010 -157-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

But November is still left out on another (QA problem? Who knows… maybe it will show up next month…)

– 6382280200062009 -64 -74 -32 19 109 134 166 151 117 23-9999-9999
+ 6382280200062009 -64 -74 -32 19 109 134 166 151 117 23-9999 -85
+ 6382280200062010 -158-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

There does not seem to be any particular limit on how long a thermometer can be a Zombie before the undead can return to wandering the earth. At a minium, it looks like there can easily be 3 to 5 months before data stabilize significantly.

Implications

This effect is going to be most misleading in January data. Since it looks like a ‘missing data’ record is not produced for ‘expected thermometer but no data’, rather the record is simply dropped. The thermometer is dead as far as v2.mean is concerned. Even records that are reported as defective in the v2.mean.failed file do not have a missing data flag in v2.mean. They are simply treated as dropped and dead, not even missing. One presumes that “whenever” some data for a new year shows up, then the record for the year is created including the prior months missing “missing data flags”… I can’t see any other way for a station presently ‘dead’ in January to be handled by this process when/ if February data show up.

Oddly enough, one of the excuses given for Bolivia being dropped was that they failed to report by the mid-month cutoff date. Clearly that is not the case. We must look somewhere other than ‘speed of data delivery’ for why some countries and some data are dropped from GHCN. There is evidence here that even reporting a few months late is Just Fine for inclusion in the set.

I need to do some numeric audits on the data file to characterize “how much” of the data is like this. For example, finding the count of 2009 records that have a ‘Dec value only’ with missing data flags for Jan-Nov ought to give an estimate of the rate of thermometers appearing after having been “dropped” all year. It would be interesting to produce a histogram of 2009 showing percent missing all months “cumulative to a given month”. It ought to start high and decay toward zero in December (since, it would seem, if you are missing all months data including December your record will simply not appear in 2009). This can be done as a retrospective study using the more recent data copies.

More interesting would be a ‘month by month’ histogram showing missing months for each monthly report from NCDC. That, however, will require a prospective data set where each month a copy of the data is collected then the 12 months can be compared for rate of ‘change of the past’. Since NCDC published copies are volatile, this can not be done retroactively (unless someone has an archive of the various copies). There could also be interesting things found by looking at daily rate of change within the month(s). While the NCDC ftp server (and everywhere I could find on the web site) do not give a stability date for the file, I would expect the changes to have a pattern. Probably a double peak matching the 2 data circulation days from the Phil Jones email ( 4th and 18th of the month).

The Americas Too?

Oh, and the same things happen in South and North America:

In this one, we have ‘dead thermometer’ for the last 4 months of 2009, then in February, the past is alive again. Except Oct and Nov. For now. So the duration of being a Zombie can wander around at least from 3 months to none.:

Dec to Feb:

– 3178890300002009 77 61 58 16 2 -1 -2 -18-9999-9999-9999-9999
+ 3178890300002009 77 61 58 16 2 -1 -2 -18 15-9999-9999 55
+ 3178890300002010 48-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

For this one, the 2009 record stays unchanged with a missing December, but January shows up:

4017895400032009 260 257 258 266 275 279 280 281 281 282 276-9999
+ 4017895400032010 269-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

While for this one we get July and September back from the dead, but it’s a 4 month Zobmbie, for now still missing in 2010:

– 4027858300042009 239 243 254 275 286 286-9999 282-9999-9999-9999-9999
+ 4027858300042009 239 243 254 275 286 286 285 282 287-9999-9999-9999

Other Issues

I’ve also not characterized how far back in time data values might show up or change. Does the Feb 2010 update change any 2008 or even 1990 data? Don’t know yet. While you would expect the past to stay the same, that does not seem to be the case in “cimate science”…

In accounting, you have fixed “close the books” dates. That, too, seems to not be the case here. The books are always open, always being re-written. Always shifting. Thermometers dead today can be reanimated tomorrow, or even in several months.

There is no audit trail and there is no ‘cut off’ or ‘stability date’. You just get to stick your dipper in the stream and take out what happens to be floating past today. One hopes that it is good enough. The “retroactive QA process” described in the Phil Jones email and the fact that data in these samples a few months old show up or change implies that the last 4 to 6 months of data are not QA checked. This would imply that any GISS map newer than about 5 to 6 months old is based on ‘best guess non-QA checked data that happened to show up lately’ and will change over time. That, further, implies that anomaly maps more recent than ‘a few’ months back ought to carry a disclaimer. But I don’t expect we’ll see one.

Oh, and the fact that Zombie Thermometers with a history of electronic reporting, (even to the point of having 2009 data updated in the 2010 v2.mean file) show as “Dead Today” (and perhaps live tomorrow?): that does show that some of the dropping of thermometers is NOT due to lack of electronic reporting. It is due to a very flakey process, at best.

So “Bolivia dropped for late reporting” and “dropped for not reporting electronically” are insufficient excuses for explaining thermometer drops. We do have the potential for ‘dropped without a clue’, dropped after decision, and dropped because local BOM quit reporting (though for the ones in Wunderground, that’s a hard excuse to make stick). At this point, the process looks to be so ersatz, uncoordinated, and lacking in controls that anything could be done to the data. At a minimum we have a Zombie Walk of Thermometers to deal with.

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, Favorites, NCDC - GHCN Issues. Bookmark the permalink.

47 Responses to Thermometer Zombie Walk

  1. Ruhroh says:

    My favorite part of Rocky/Bullwinkle was
    Fractured Fairytales,

    narrated by Edward Everett Horton.

    The applicable tale would seem to be

    Hair today, goon tomorrow…

    Thanks for highlighting the incredibly dynamic status of the reference database.
    RR

  2. RK says:

    The details of the temperature records are certainly ugly. The evidence as unearthed by this fine website shows the records are not kept in a professional manner. We should petition Congress to move the responsibility of temperature record keeping to some other more professional organization that would have no opinion about climate change. Its sole function is to produce a reliable record.

  3. E.M.Smith says:

    @Ruhroh

    Maybe it’s just that I come from a financial systems background, but this just makes my skin crawl. If anyone doing financial records ran things this way, they would have a half dozen government agencies and audits on their back.

    Then again, I’ve also been in the engineering R&D world most of my career. Things were somewhat looser there than in finance, but still you had formal QA, code review, data archival standards, release sign offs and release dates, etc. And once a “golden master” for a release case declared, it never changes.

    I donno, maybe it is just me… but this ‘process’ is just so wrong in so many ways and I keep running into surprises where there ought not to be one. Things like, oh, it appears that no one at all is in charge of what thermometers end up in any given “release” of GHCN. DFW was in last year, but isn’t in this year (yet) because the January value was put in a QA bucket? And no ‘missing data’ flag? Just amazing. And other countries data can be “out” for several months, then just show up to the party “whenever”? Maybe? Sometimes? “Directed Chaos” seems to be the design standard…

  4. Bruce says:

    E.M.Smith: What you have to realize is that we are dealing with academia. The end goal in their process is peer review and publishing. They do not have the discipline or skills to produce a product. I was an engineer for 40 years. Their peer-review would correspond to our design review. But after our design review was an unending sequence of process controls and quality controls. What is needed is a management structure that would manage the process, taking the input from the scientistists and directing the statisticians, programmers and other skilled people in assembling and maintaining the temperature data base.

  5. P.G. Sharrow says:

    For people that claim that they can determine world temperature changes to one hundredth of a degree this is awfully sloppy science. But their work is “peer reviewed” :?)

  6. E.M.Smith says:

    @Bruce and P.G.Sharrow: OMG, I think I understand… and it is terribly disheartening to realize that they think they are done where we would think we got the first “go ahead’ to begin the design / development process…

  7. hunter says:

    The AGW defense position is that the temp record is too sophisticated, and too well smoothed and managed for these problems to yield inaccurate results.
    I urge you to speak to these defenses, since these audits and evaluations are being laughed at by academics, so far.
    Here is where the Texas state climatologist hosts a blog.
    http://www.chron.com/commons/readerblogs/atmosphere.html
    Dr. Neils-Gammon was the first mainstream scientist to point out that the IPCC was fibbing irt glaciers.
    He is tough minded, and has posted a rather harsh review of the Watts et, al work on land temp inaccuracies.
    If the land temps mess is in fact corrupting the output- and I do not see how it cannot be garbage data and robust output, then understanding the positions of people like Dr. NG is going to be the place to start.
    If he is right, then that needs to be well understood, as well.

  8. j ferguson says:

    E.M.
    If the temperatures show up late to the party, do they stay?

  9. e.m.smith says:

    @Hunter:

    I’m not all that interested in how other people can be wrong or what broken beliefs they hold, no matter how elegantly held. Yeah, some day some one needs to work out where they have gone wrong. But I’d rather work out what is actually happening and what the data really say and do.

    @j Ferguson:

    I guess that all depends on what the various folks who touch the data do to the data… but it generally looks like it stays. Maybe.

  10. Raven says:

    E.M.
    .
    I read the NG critique and he looked at the code and pointed out that the stations are shifted by the difference in their individual mean before being combined to create the average. This effectively removes any bias introduced because some stations are colder than others.
    .
    He also withdrew his earlier criticism and agreed that your statements about stations being combined before the anomolies are calculated are true but misleading since the shift before averaging has the same effect as combining after calculating anomolies.
    .
    I think it is worth your time to address this point.

    REPLY: [ I have. He has a theoretical. I ran a benchmark. I changed the data (put in the missing USA stations from back when GIStemp had left them out from 5/2007 until 11/2009) and the Northern Hemisphere Anomaly report changed to show that leaving out the stations warmed the anomaly map. So I can waste my time bickering over a nice pointless theoretical or I can spend my time looking at what really happens. I prefer the latter.

    https://chiefio.wordpress.com/2009/11/12/gistemp-witness-this-fully-armed-and-operational-anomaly-station/

    I’ve read his stuff (his original article claimed that I was wrong because a thermometer anomaly was calculated with reference to itself. Then he had to publish a retraction / correction that in fact it was ‘basket to basket’ as I’d described it; but then went ahead and said it can’t matter anyway. Just wrong. It amounts to an assertion that GIStemp is a perfect filter. It isn’t.) Knowing that the actual code DOES have sensitivity to station selection and asymmetrical stations in the two anomaly periods (as shown by the benchmark) there isn’t much more to say. It will just end up in yet another “theoretically it doesn’t” vs “in reality it does”. -E.M.Smith ]

  11. hunter says:

    e.m. smith,
    I am a skeptic on AGW. And pretty vocal about it.
    I have met Dr. ng, and I can tell you that if any mainstream scientist is open to skeptical reviews that hold up, he is that person.
    You are proposing an important re-evaluation in the way temperature records are gathered and analyzed. I happen to agree with the point of this. I am convinced that the very well documented widespread problems of the temperature data cannot be simply smoothed away by a bit of code.
    Especially by Hansen, who I think is well documented as to confusing his messianic complex with science for a number of years.
    I see AGW in two parts- the social movement, which is imploding, and a theory, that has never held up under critical review.
    The social/political movement is dying by the second.
    But the science is what has always interested me.
    I like your take, and I think the problems you point out are real.
    But confronting the real critics, which I believe Dr. ng is, and reasoning, is the only way to win the war of getting climate science right.
    I am not suggesting reasoning with the bellicose neverwuzzers like Romm. That is a waste air.
    Respectfully,

  12. E.M.Smith says:

    @Hunter et.al.: Discussion of Ng is closed. All I can hint at is that I’m pretty sure I have identified the exact mechanism by which the GIStemp method fails and I’m talking to folks about publication potentials. So no, I’m not going to give that away to someone else. Be content that the benchmark shows GIStemp anomaly method fails to remove all bias. -E.M.Smith

  13. E.M.Smith says:

    FWIW,

    Here are the CLIMAT reports for Bolivia:

    http://www.ogimet.com/cgi-bin/gclimat?lang=en&mode=1&state=Boli&ind=&ord=REV&verb=no&year=2009&mes=01&months=

    Including 2010:

    http://www.ogimet.com/cgi-bin/gclimat?lang=en&mode=1&state=Boli&ind=&ord=REV&verb=no&year=2010&mes=01&months=

    They also have Sortavala Russia: 638228020006 above:

    http://www.ogimet.com/cgi-bin/gclimat?lang=en&mode=0&ind=22802&ord=DIR&year=2010&mes=1&months=12

    with the January data in along with the November data.

    So the notion that the CLIMAT reports are not available is clearly false. I can only presume that there is some kind of filter keeping those data out of GHCN.

    The first example, that 101604030002 station is in Guelma Algeria:

    http://www.ogimet.com/cgi-bin/gclimat?lang=en&mode=0&ind=60403&ord=DIR&year=2010&mes=1&months=12

    Has all their CLIMAT reports, and up to date too.

    Maybe NCDC needs to ask all the other folks (who DO get the data on time) how they do it?

    Oh, and oddly enough, one of the randomly chosen stations above:
    535912120001

    is the NWSO at Agana, Guam. So NOAA has to get the data from NOAA…

    These folks:

    http://weather.gladstonefamily.net/site/PGUM

    place it in the middle of the airport. The very large airport (though looking closely on the google map, I think it’s near the blue roof building down toward the lower left of the airport off the end of the runways rather than in the middle).

    One can only wonder why a NWSO site in constant use at a major international airport can’t report the data… to itself…

  14. hunter says:

    e.m. smith,
    That sounds great. My concern is with falling into the same trap as the AGW community- ignoring critiques. You are not.
    I wish you well.

  15. Raven says:

    E.M.

    Thank you for your response.
    I appreciate the work you are doing.

  16. pyromancer76 says:

    Glad you did not fall for the Dr ng and Hunter “criticism” Looking forward to your paper, chapter, book. Hope there are many more. Thanks for your openness and the thorough nature of your work.

  17. Pingback: Denialgate continues « TWAWKI

  18. Brit Borden says:

    Thank you for another great blog. Where else could one get this kind of info written in such an incite full way? I have a project that I am just now working on, and I have been on the look out for such info.

    REPLY:[ Well, not to pry, but the link is to a Neurosurgeon site… and I’m suspicious that a neurosurgeon would be looking at thermometer lifetime projects… So this link LOOKS like a SPAM link (though the idea of a neurosurgeon using SPAM to generate clients seems a bit ‘out there’ too… So while this ‘gratuitous compliment’ (of which I get a half dozen a day minimum in the SPAM queue) looks suspicious, I’m going to let it through for a while. And ask if you can share what the ‘project” is that can use thermometer info like this? -E.M. Smith ]

  19. hunter says:

    pyromancer76,
    My questions were not critiques, nor am I seeking to undermine skeptics.
    You can check out what I write across the internet and if you think I am a supporter of AGW theory, then seek reading assistance.

    On a separate topic, Tamino is now using the children to sell AGW hype.
    Here is my response to him:
    I think your analogy is fascinating. Here is a way to look at it, using your analogy, that I think makes the point even more plain:
    AGW scientists are people who, from brief exams of a few children, decide that all children are growing abnormally and require extremely expensive medicine. Then it turns out they are making the diagnosis based not on historical clinical data but on computer models. Other workers, and parents, point out that those making the diagnosis are not really looking at larger pictures. Then it turns out that the ‘abnormalities’ the scientists see are actually statistically insignificant, and not really dangerous or abnormal.
    And then, on top of that, you find out that the head of the study, and his pals, have gotten rich off of promoting the study.
    But the scientists, in response, claim that the parents and and doctors and scientists who point out the problems of those pushing the expensive cure are crooks, don’t love their kids, and are paid by competing drug companies to make children sick.

    What are the chances he lets it post?

  20. hunter says:

    The children update:
    And the expensive medicine those scientists are selling is not proven to change growth rates as claimed.

  21. John Slayton says:

    A bit OT, but not entirely. I asked this over at WUWT but I think it got lost in the chatter, and I really would like to know the answer.

    Goddard in ‘Glolbal Warming in Texas’ (at WUWT) uses a chart showing temperatures for Temple, TX, though 2008. Problem is, the Temple USHCN station was closed in 2003. Presumably, the last 5 years of data were infilled from a nearby station.

    My question: Is there any way to know what station?

    REPLY:[ Yes. GIStemp makes a log of what data fills in which. If you have the StationID I can do a run and let you know. Right now I’m trying to drop out Bolivia stations and it’s dying in STEP1 in the Python bit on a ‘drop strange’ … I’d had the theory that if you could have stations be Zombies and Dropouts so much maybe it was just the USA data that was ‘brittle’ to change (due to the USHCN / GHCN merger in STEP0), but it’s not. There’s at least one more step with ‘custom values’ that is cranky to station change. But I have a saved non-changed copy of the v2.mean file, so it’s all of about 10 minutes to put it all back and re-run clean. -E.M.Smith ]

  22. boballab says:

    EM and John

    The Graph that Goddard used in his post on WUWT came from the USHCN interface site of NCDC not from NASA GISS.
    Here is the link to that graph:
    http://cdiac.ornl.gov/cgi-bin/broker?id=418910&_PROGRAM=prog.gplot_meanclim_mon_yr2008.sas&_SERVICE=default&param=TAVE&minyear=1895&maxyear=2008

    You can also get the monthly data from their interactive site and here is that link:
    http://cdiac.esd.ornl.gov/sasserv/TX418910_5194.csv

    Note that the data for station 418910, Temple Texas goes all the way up to Dec 2008.

  23. boballab says:

    Reading the information on the temple record, according to NCDC the station was not closed and there has been no change in station:

    418910 31.0781 -97.3183 210.0 TX TEMPLE —— —— —— +6

    http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ushcn-stations.txt

    here is how the record is read according to NCDC:

    STATION INFORMATION
    The format of each record in the USHCN station inventory file (ushcn-stations.txt) is as follows. Variable Columns Type
    COOP ID 1-6 Character
    LATITUDE 8-15 Real
    LONGITUDE 17-25 Real
    ELEVATION 27-32 Real
    STATE 34-35 Character
    NAME 37-66 Character
    COMPONENT 1 68-73 Character
    COMPONENT 2 75-80 Character
    COMPONENT 3 82-87 Character
    UTC OFFSET 89-90 Integer

    These variables have the following definitions:

    COOP ID is the U.S. Cooperative Observer Network station identification code. Note that the first two digits in the Coop ID correspond to the assigned state number (see Table 1 below).

    LATITUDE is latitude of the station (in decimal degrees).

    LONGITUDE is the longitude of the station (in decimal degrees).

    ELEVATION is the elevation of the station (in meters, missing = -999.9).

    STATE is the U.S. postal code for the state.

    NAME is the name of the station location.

    COMPONENT 1 is the Coop Id for the first station (in chronologic order) whose records were joined with those of the HCN site to form a longer time series. “——” indicates “not applicable”.

    COMPONENT 2 is the Coop Id for the second station (if applicable) whose records were joined with those of the HCN site to form a longer time series.

    COMPONENT 3 is the Coop Id for the third station (if applicable) whose records were joined with those of the HCN site to form a longer time series.

    UTC OFFSET is the time difference between Coordinated Universal Time (UTC) and local standard time at the station (i.e., the number of hours that must be added to local standard time to match UTC).

    http://cdiac.ornl.gov/epubs/ndp/ushcn/monthly_doc.html

    As you can see no stations added to the the record.

  24. boballab says:

    The Temple record is weird. Besides what I posted above I went and checked the site were NCDC has the paper copies in PDF format for the CO-OP sites and it ends in Sept 2003 as John said. I also saw where John did a survey for surface stations at the treatment plant and there is no thermometer there anymore as shown in the pictures. So where is that data comming from? So I went to the NWS site since besides the CO-OP stations USHCN is also suppose to use NWS first order stations and I found that there is a AWOS station in the Temple area:

    Temple, Draughon-Miller Central Texas Regional Airport, TX, United States
    (KTPL) 31-09N 097-24W

    Now the only thing I can think of is that they are using the AWOS data from the airport as the USHCN data that used to be from the CO-OP station but without adding a different station ID to the record as they are supposed to.

  25. E.M.Smith says:

    Don’t know where to put this yet… but from the NASA / GISS FOIA emails just released we have this:

    Subject: Re: Your Reply to: GISS Temperature Correction Problem?
    From: Gavin Schmidt gschmidt@giss.nasa.gov
    Date: 19 Feb 2008 14:38:47 -0500
    To: rruedy@giss.nasa.gov

    I had a look at the data, and this whole business seems to be related to the infilling of seasonal and annual means. There is no evidence for any step change in any of the individual months.

    The only anomalous point (which matches nearby deltas) is for Set 2005. Given the large amount of missing data in lampasas this gets propagated to the annual (D-N) mean – I think – with a little more weight then in the nearby stations. The other factor might be that lampasas is overall cooling, if we use climatology to infill in recent years, that might give a warm bias. But I’m not sure on how the filling-in happens.

    Gavin

    So as I read this, the folks at NASA responsible for GIStemp are saying that large data dropouts (i.e. Zombie Stations for a while or Dropouts for longer periods) “gets propagated” to the means (and thus the map products) and that if “we use climatology” (i.e. the way GIS uses the relationships between areas climatology as it calculates ‘offsets’ – that’s the jargon for their process) that might “give a warm bias”.

    Gee.

    Maybe I don’t need to convince them that missing data can lead to climatology based infill giving a warming bias. Maybe I only need to get them to ADMIT it publicly… Oh, wait, the FOIA email looks like it does that…

    It comes in a four parts from:

    http://www.nasa.gov/centers/goddard/business/foia/GISS.html

    Part one is:

    http://www.nasa.gov/centers/goddard/pdf/415776main_NASA%20GISS%20Temperature%20Data%20(Part%201%20of%203).pdf

    Part two:

    http://www.nasa.gov/centers/goddard/pdf/415777main_NASA%20GISS%20Temperature%20Data%20(Part%202%20of%203).pdf

    etc.

  26. John Slayton says:

    Smith, Baballab, et al

    Thanks for your attention. I guess the short answer to my question is “No.”

    Real transparency, No es cierto?

  27. boballab says:

    John I went back and dug a little deeper into the data at the USHCN site and found some flags they attached to the monthly means.

    From Apr 2003 through 2008 there is an E flag on the values. This is what they say an E-flag means:

    E = value is an estimate from surrounding values; no original value is available;

    Doesn’t say which stations are used, but unless you dig all the way down to where you turn the flags on when you get the data you wouldn’t know that it’s “infilled”. When you just get the data it looks like one station only and when you look that station up on the station list there is no sign that there isn’t a station there reporting.

  28. E.M.Smith says:

    boballab
    Reading the information on the temple record, according to NCDC the station was not closed and there has been no change in station:

    418910 31.0781 -97.3183 210.0 TX TEMPLE —— —— —— +6

    http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ushcn-stations.txt

    here is how the record is read according to NCDC:

    The GIStemp record is:

    42572257005 TEMPLE 31.08 -97.32 193 173S 46FLxxno-9x-9WARM FIELD WOODSC2 26
    Which says 31.08 LAT -97.32 LON 193 ft elevation which more or less matches the 31.0781 -97.3183 210.0 of the above record.

    The end of the data looks like:

    4257225700501993 86 113 141 176 219 266 286 299 259 194 116 108
    4257225700501994 95 111 166 197 224 277 295 283 251 211-9999 112
    4257225700501995 106 130 139 177-9999 251 287 287 253 202 140 104
    4257225700501996 79 124 114 184 266 274 301 282 242 192 135 99
    4257225700501997 75 96 143 151 206 237 290 279 262 197 118 83
    4257225700501998 106 103 131 174 247 286-9999 289 264 208-9999 99
    4257225700501999 104 138 137 204 225 267 275-9999-9999 194-9999 106
    4257225700502000 104 132 162 180 244 262-9999-9999 263 204 114 53
    4257225700502001 69 116 107 193 233 261 284 287 235 178 151 106
    4257225700502002 95 85 114 213 238 260 272 284 261 190-9999-9999
    4257225700502003-9999-9999 121-9999-9999-9999-9999-9999-9999-9999-9999-9999

    Where we see it end in 2003 but with the last usable data really in 2002.

    And the chart of “combined” at GISS looks like it does, in fact, end at about 2002:

    http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=425722570050&data_set=1&num_neighbors=1

    the “homogenized” ends in 2003:

    http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=425722570050&data_set=2&num_neighbors=1

    So I can only guess that “the other folks” merge two thermometers that GIStemp does not merge…

    So now we’re back in the “provenance and modification history” game again:

    WHICH copy of USHCN are you looking at? The data I used is from the “about 9 months ago” copy when I began, and is USHCN version one. NOW GIStemp uses USHCN.v2 as it’s data input. So the GISS web maps ought to be reflecting USHCN.v2 data (as of about Dec end). Is there another adjustment series or a different download date or?…

    Basically, what EXACT FILE are you looking at to get the later year ending data?

    Or more bluntly: I have no idea what ‘cdiac.ornl’ are doing in their particular way of “re-imagining” the data …

  29. boballab says:

    EM the place I got it is the Web interface portal for USHCN v2 (thats cdiac.ornl.gov). Basically where I got the information is USHCN’s version of the NASA GISS system where you can get single station data. Here is a walk through:

    The daily data include observations of maximum and minimum temperature, precipitation amount, snowfall amount, and snow depth. Monthly data include mean, maximum, and minimum temperature and total precipitation. Records for most stations extend through 2008.

    Please refer to the daily and monthly data documentation before using these data.

    These data are available in two ways:

    FTP
    Daily Data
    Monthly Data

    Web interface: This interface allows users to query, plot, and download data for individual states, stations, and variables. CDIAC developed this system using Google MapsTM and SAS/IntrNetTM.

    http://cdiac.ornl.gov/epubs/ndp/ushcn/access.html

    From there you select Web interface which gives a screen where you have a Google US Map where you can select the State, then the station you want:

    UNITED STATES HISTORICAL CLIMATOLOGY NETWORK
    Select a state from the pulldown list and click ‘MAP SITES’ to show its stations on the map.

    Click on a map station or select a station from the state station list to navigate to the daily and monthly data and documentation

    http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html

    From there you get a screen that lets you plot a graph of temp or precip and further down there is another section that lets you get a CSV file with the data for that station. You can get the data with or without flags.

    U.S. Historical Climatology Network – Monthly Data
    You have chosen site 418910, TEMPLE, Texas

    Available Plots:
    Number of months with data vs Year
    Mean temperature vs year
    Precipitation totals
    Climate variable vs month
    Climate variable box and whisker plot

    Available download files:
    Create a download file of monthly data
    Create a download file of data summarized by year (Jan 1 – Dec 31)
    Create a download file of data summarized by hydrological year (Oct 1 – Sep 30)

    http://cdiac.ornl.gov/cgi-bin/broker?_PROGRAM=prog.climsite_monthly.sas&_SERVICE=default&id=418910

    When I got the data with the flags added in (you have to tell it you want flags) it came up that from 04/2003 on there was a E flag. My guess is that GISS sees that flag when they down load the data and kicks those years. There is a way to check that hypothesis, since there is other E flags prior to the 2003-2008 time frame and see how GISS rected to that.

  30. boballab says:

    Basically in the end NCDC infilled from surrounding stations from 04/2003 to present, GISS says I don’t like your infill and drops that data.

    Funny one Infiller doesn’t trust another Infiller, but hey it’s good enough to play with the worlds economy.

  31. Pingback: Before Using Temperature Data Read The Fine Print « Things I Find Interesting

  32. Rod Smith says:

    Weather observations from PGUM (Agana, Guam) have been available routinely on weather circuits for decades. This is data concerning public safety, and the idea that it isn’t routinely available to NOAA is absurd. (I just checked the NOAA web-site and got the latest ob from PGUM — a METAR about 30 minutes old.)

    My two cents: Something is really rotten, and it isn’t in Denmark!

    Keep up the good work!

  33. boballab says:

    Well I just found something very interesting. It turns out that GHCN and USHCN isn’t the official climatological datasets for NCDC, its something called DATSAV2 and it was started in 2003 and it is much bigger then GHCN with 10,000 active stations. I got a post here about it:
    http://boballab.wordpress.com/2010/02/20/now-this-is-interesting-a-different-larger-dataset-at-ncdc/

  34. MikeN says:

    Have you seen Roy Spencer’s latest where he calculates that the missing thermometers has had no effect? It is a byproduct of trying to do a satellite based ground temperature reading.

  35. boballab says:

    Mike I saw it but I think I found an error in his method, it has to do with baseline.

    First Dr. Spencer didn’t mention how he made his anomalies for the period 1986 – 2009, nor mention what his base period even is.

    Second he said that he recomputed the anomalies from CRUTem to match his base period. Since CRUTEM uses the 1961-1990 baseline that can’t be it.

    So I’m left with just a guess but Dr. Spencer is probably using a 20 year period like 1986 thru 2005. Now the problem being is that the old baselines are based on more stations back in the 61-90 baseline then his new one so he didn’t really lose that many from what his base period is made of to what he is later comparing to.

    Keep this in mind most of the records I have seen so far of stations that surive to the present date are not the long lived record stations, they died out. So what you got is a bunch of records at stations that start in the 1950’s that survive until today and the historic records of the late 1800’s and 1900’s (up into the 90’s) at stations that no longer “report”. I showed how this effects the New Zealand record here:
    http://boballab.wordpress.com/2010/02/09/when-station-data-goes-missing/

    For New Zealand there is only 4 stations that match the base period of 1961-90 and have over 50% data coverage in the 2000’s all the other stations died off. I also show a simulation of how extreme losing stations can be be cutting just 1/3 of the data from 1/4 of a data set.

  36. John Slayton says:

    Temple (discussed above) is not alone. Just looking over my visits from last year I find the following:

    Fremont OR closed 19 Apr 96
    Modena UT closed 9 Jul 04
    Corinne UT closed 1 Mar 07
    Pecos TX closed 21 Dec 01

    All these show graphed data to 2008 as though it were genuinely from these sites.

  37. boballab says:

    I looked up those stations in my copy of the USHCN raw mean file and this is what I found:

    Freemont’s last full year of data was 1994
    Modena’s last full year of data was 99, spotty until 2003
    Corinne last full year of data was 1997 and very sparse for the next couple of years.
    Pecos last full year was 98.

    So every year with data after the last full year is infill in those records and unless you download the entire raw data file from the ftp site you can’t get raw data. The web interface is adjusted only and nowhere do they tell you that unlike GISS which is more straight forward in what you get from there web interface.

  38. E.M.Smith says:

    @Rod Smith:

    Yeah, I really did randomly pick that record, then went back after posting it to see who was this flakey pacific region station. To find it was one of THE major airports in the whole pacific, a place that millions of G.I.’s flights have passed through, a place that rolls off the tongue of any GI or aviator in the progression Travis AFB, Hawaii, Guam, Tokyo … That This major crossroads was not recording the weather? Not just wrong, so wrong it makes you want to go “Space A” over there and talk to someone “up close and personal”… (I’ve had phone calls to friends on Guam in the GI housing / hoteling facilities as a hurricane approached, and THEY had weather reports… )

    @MikeN: Yet another Hypothetical Cow…

    @John Slayton: Hey, the process is based on Hypothetical Cows, so why not use Theoretical Data. Everyone knows it’s easier to get the result you predict if your process is hypothetical and your data is theoretical!

    @Boballab: go go Go GO GO! GO!!!

    He shoots, He SCORES!!! ;-)

  39. Rod Smith says:

    The “gladstonefamily” link above references a COOP observer at PGUA and is surely misleading in that NO INTERNATIONAL AIRPORT will be staffed with a “COOP” observer. I suspect the COOP observations are on the PGUM auxiliary site1 somewhere well off the airfield — probably at the spot marked as CW4647 on the map. CW likely stands for Coop weather.

    And I might mention that the USAF still operates from the eastern shore of Guam at Anderson AFB, PGUA. I remember our Ops Officer at PGUA years ago briefing the crew to, “Be careful, we have and excess of water out there.” Personally, I had already noticed the Pacific Ocean!

    REPLY: [ Please note that it lists program(s) as in plural then gives a list that includes several. Yes, one is COOP, but it also lists ASOS and WSMO. That WS is for Weather Service… and the site is clearly identified as “Location: Antonio B. Won Pat International Airport, Guam”. Also the nice big airport in the picture matches the description and if you look up the Guam WSMO you will find it’s address is there. Further, comparing a picture of the NOAA WSMO offices, and that Google Sat picture, I think I’ve identified the location of the building in the lower left corner and the weather station box has a limited number of possibles visible in the picture too. So, basically, I think the link is correct and most likely they are just noting that the WSMO is also the station where COOP activity is reported. That is, they run the COOP program. (though that bit would be speculation on my part). -E.M.Smith ]

  40. Ruhroh says:

    I went over to that USHCN site pointed out by boballab,

    http://cdiac.ornl.gov/epubs/ndp/ushcn/access.html

    and
    at the bottom there is (maybe some new) text:

    “The daily data include observations of maximum and minimum temperature, precipitation amount, snowfall amount, and snow depth. Monthly data include mean, maximum, and minimum temperature and total precipitation. Records for most stations extend through 2008.
    Please refer to the daily and monthly data documentation before using these data.
    These data are available in two ways:
    * FTP
    o Daily Data
    o Monthly Data
    * Web interface: This interface allows users to query, plot, and download data for individual states, stations, and variables. CDIAC developed this system using Google MapsTM and SAS/IntrNetTM.

    Please note: for users already engaged in analysis using the previous version of these data, you may still access the previous version through the end of 2009 .
    Daily Monthly Please email Dale Kaiser with questions. ”

    Where the “Dale Kaiser” is an email link thusly;
    kaiserdp@ornl.gov

    So, this seems to be a way to get a ‘clean’ version of USHCNv1 and a version of v2 with known parameters.
    I leave it to someone more coherent than me to work with DK and snag reference copies.
    The “monthly” access to prior version link on that page seems to still go to a valid page with many files and this text;
    “These files comprise CDIAC’s version of USHCN monthly data
    through 2006.
    An updated version of the database (through 2008) is available here:
    http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html.
    However, as of August, 2009 the updated version only has “final”
    or “fully-adjusted” data (what most users will want). In the near future, “raw” and “tobs” versions of the files will be available here.
    Please visit NCDC’s USHCN Vs. 2 site:
    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/
    for a full explanation of and access to these additional file types. August, 2009″

    So, the novel news (to me anyway) is the POC.
    Proberly on EST so must inquire before the late afternoon when spacecadets like me become almost functional.

    Maybe he would release the USHCNv2 source code?
    RR

  41. E.M.Smith says:

    @Ruhroh: Nicely done. Now I just need to find another 10 hours to exploit the information ;-)

    I assume POC is Point Of Contact…

    Oh, and holler at me when the washer is fixed and you want Chinese lunch ;-)

    E.M.Smith

  42. John Slayton says:

    OK, let’s assume the Temple record is infilled from the airport AWOS station. That’ll explain the last 5 years of the graph. Now, what about the first 56 years? (the Temple station opened in 1951.) Where were the Wright brothers when we needed them? : >)

  43. MikeN says:

    You seen Tamino’s analysis? He is eager to get something into print ahead of the skeptics. I’m usually pretty good at spotting where he is fudging things, but here not yet.

    REPLY:[ I’ve looked at it, despite having no interest in it. He leaps to the conclusion that the only way bias could have an impact is if the anomalies (done his mystery sauce way) are divergent. That is an error of assumption (see the “hypothetical cow” posting here).

    https://chiefio.wordpress.com/2010/02/07/of-hypothetical-cows-and-real-program-accuracy/

    GIStemp does “Basket A” to “different post processing and adjusting and infilled and UHI changed and near the end of the process … Basket B”. In that context, there are plenty of opportunities for bias to leak through. So he has assumed something that is not true, and proceeds to waste his time after that. “Given my conclusions what assumptions can I draw” comes to mind. Kind of a pointless effort and one that I don’t feel any need to waste my time explaining. -E. M. Smith ]

  44. boballab says:

    MikeN it will be hard for Tamino to get something in print since he found out that RomanM showed am improvement to his “optimal” method and Tamino was forced to admit the truth. So if there is any paper in the works it is RomanM’s. See these two links:

    So Tamino got it right? Well, not exactly…

    Plan B: A “More Optimal” Model

    The model above (along with Tamino) assumes that the difference between a pair stations is a constant throughout the year. For geographical reasons, this may not be true much of the time even for stations that are relatively close. Examples of this would be a station in a coastal area compared to one farther inland. The coastal station will likely vary less during the year: cooler summers and warmer winters than the one farther inland. There can also be such differences observed due to a difference in altitude. If Tamino had looked at the residuals (temperatures minus sum of combined series and offsets) of the example series in his post , he would have noticed a pronounced annual periodicity. One might argue that this would disappear (it won’t since only the τ(t)’s would be used) if the resulting combined series is somehow anomalized, but the difficulty goes much deeper.

    The reason for the necessity of this procedure is the presence of missing values. When a single offset is used, not only will a noiseless set of periodic series (with varying monthly differences) be unable to reconstruct the mean series as values are removed, but the size of the error in the estimated value will also depend on which months are missing.

    http://statpad.wordpress.com/2010/02/19/combining-stations-plan-b/

    Then in the next link RomanM shows how missing data effects Tamino’s method compared to RomanM’s:
    http://statpad.wordpress.com/2010/02/25/comparing-single-and-monthly-offsets/

    Then check out the comment section of this thread over on tAV:
    http://noconsensus.wordpress.com/2010/02/25/roman-seasonal-anomaly-offset/#comments

  45. John Slayton says:

    I’m not sure what thread this belongs in, but an earlier exchange here involved an examination of the flags used to qualify reported data, which is essential to what I write below. I would be interested in any comments you and your readers might have to my question (4th paragraph down).

    It started when I noticed on NOAA’s metadata site that the USHCN station at Cal Poly San Luis Obispo had been moved since Anthony Watts visited in 2007. I decided to drop by and update the photography. The new site was not where NOAA said, of course, so I looked up the current curator, Dr. Stuart Styles, who directed me to the right location. He was, in fact, kind enough to drop what he was doing and spend about half an hour discussing his work to bring the station up to standards, including his efforts to straighten out the records. (Total expenditures so far, in the neighborhood of $15K) He called my attention to the fact, which I would have otherwise not known, that for about a 5 year period NOAA was recording data from the wrong station.

    When I returned home I was able to verify this. The digitized monthly reports (normally B91) on line at
    http://www7.ncdc.noaa.gov/IPS/coop/coop.html?_page=2&state=CA&foreign=false&stationID=047851&_target3=Next+%3E
    show that from around May in 2005, to September of 2011, data being reported was actually from Wunderground station KCASANLU4. Apparently this station is a project of a different department in the university.

    So my question is this: What is being done with these 5 years of data? Are these numbers now incorporated into the major datasets?

    Working from http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html, I examined the _monthly_ data. I looked to see what flags appeared and what they could tell me. The monthly mean temperatures at http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html , are flagged with E for:
    2005 – April, June, August
    2006 – June, July, December
    2008 – January
    And that’s the only warning I saw.

    The _daily_ data from this site is hard to read, as the column presentation seems inconsistent. So I looked at the FTP daily record from http://cdiac.ornl.gov/ftp/ushcn_daily/
    This record is littered with flags. I can’t find meanings for a couple of them in http://cdiac.ornl.gov/ftp/ushcn_daily/data_format.txt. Here’s what I see:

    Source flag ‘0’ (zero): before 2000 to 5/31/05 Means:
    12/1/05 to 12/31/05 US COOP Summary of the Day
    4/1/06 to 5/31/06 (DSI-3200)
    7/1/06 to 8/31/06
    11/1/06 to 11/31/06
    1/1/07 to 11/30/07
    2/1/08 to 5/31/08
    8/1/08 to 12/31/10

    Source flag ‘H’: 6/1/05 to 11/31/05 Means:
    1/1/06 to 3/31/06 High Plains Regional Climate Center
    6/1/06 to 6/30/06 (Real Time Data)
    9/1/06 to 10/31/06
    12/1/06 to 12/31/06
    12/1/07 to 1/31/08
    6/1/08 to 7/31/08

    Source flag ‘K’: 1/1/11 to 11/31/11 Meaning unknown. K is not a source flag
    Source flag ‘7’: 12/1/11 to 12/31/11 Meaning unknown. No such flag listed

  46. E.M.Smith says:

    @John Slayton:

    Well, that’s an interesting story!

    What you have discovered is just how dirty even the better date can be. That was a station at a very good name university. Imaging what it’s like from “Joe’s Place Near the BBQ”!

    I don’t know where I’d put it either. Perhaps T6 Tips for more current notice. I’ll put a pointer there to your comment here.

    Per Flags: They are what they are, which is pretty limited. All sorts of things happen that get no flag. Some things seem to get flags when nothing happened. The metadata are a mess. My pet peeve about them is that there is ONE set for a station. Say the station is at an airport, like the one in Germany that was used for the Berlin Air Lift. It gets an A for Airstation. Then, years later, it gets turned into a shopping mall. The A goes away. So ALL of that past history of being an airport just evaporates…. They really need flags by year in GHCN.

    Frankly, IMHO, the metadata are so poor and the readings so sloppy some times, I’d not ascribe any truth / accuracy to readings of less than a whole degree, and similarly for any averages made from them.

Comments are closed.