GIStemp – The Curious Case Of Calcutta

Calcutta and Environs Map

Calcutta and Environs Map

Orginal Image

When is a Splice not a Splice? When it is a Deletion

I know, I’m supposed to provide answers, not questions, but sometimes the question is all you have, and sometimes it is the most interesting part.

In another posting, the Langoliers Lunch, I was adding an update on what STEP2 did to delete a couple of hundred thermometer records. In the process, I found that Calcutta / Dum had a deleted data set in the STEP2/short.station.list file, but inspection of the v2.mean input data showed there was another record with a different modification flag, that ought to have been merged with the discarded record to make a composite record. This was a bit puzzling, so I went to the GISS web site to look at what they thought their version had done.

At: http://data.giss.nasa.gov/gistemp/station_data/
you can choose to look at individual stations and plot their data at different points in the processing. There is a “drop down menu” with choices for the combined GHCN / USHCN data set (that they call “raw” but is really half baked ;-) in that it has some adjustments in it already along with a somewhat flawed merger of USHCN with GHCN from STEP0 ); the “after combining sources at the same location” (which they note they “have renamed the middle option (old name: prior to homogeneity adjustment)” which is the STEP1 process; and the “after homogeneity adjustment” which is the output of STEP2.

Now you would expect that what was “after combining” would stay combined when it went through “homogeneity adjustment”, but it doesn’t.

Just as the log file shows, the first half of the “combined sources” gets dropped.

So “homogeneity adjustment” really does / can mean “record deletion”.

While it is nice to have the confirmation that I was reading the tea leaves right, it is still a rather odd behaviour.

You can see the two graphs here:

First, as “combined”

http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=207428090001&data_set=1&num_neighbors=1

Then, as “homogenized” and uncombined

http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=207428090000&data_set=2&num_neighbors=1

Very strange…

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific and tagged , . Bookmark the permalink.

13 Responses to GIStemp – The Curious Case Of Calcutta

  1. Bishop Hill says:

    “Half baked” LOL!

    One typo – tea “lives” ->leaves.

    I wish I had more time to study what you are doing in more detail (and the programming skills to follow it properly) but keep up the good work.

  2. E.M.Smith says:

    @Bishop Hill:

    Glad you liked it ;-)
    Typo fixed too.

    Programming is, at the same time, incredibly difficult and complex while being very straight forward. The syntax of any computer language is far simpler than any natural language. I’ve learned enough of some programming languages to be functional in just a day or two. Yet they are very “brittle” and demanding on some details. You may not swap a 0 and O nor “,” and “;” nor single quotes for double quotes without grief, nor can you be vague at all. And that “simple” syntax is sometimes implemented in a very painful way…

    But it is easier to read the stuff than to write it, so I try to provide the code so folks can see it if they want, then put an English “wrapper” around it so you don’t have to actually read the code.

    At the end of the day, a “program” is just a recipe. But instead of directions like “Add 2 eggs. Whip until stiff. Fold in sugar” you get “Add 2 station records, join with a splice, smooth the splice with averages” but written in a peculiar language…

  3. Peter Dunford says:

    It’s even more interesting when you bring in Calcutta/Alip from nearby (18Km). That station runs from 1880 and they truncate the first 40 years even though that loses them the first 0.75 degrees of warming. At data=18 C-Dum is about .3 degrees cooler than C-Alip. At data=28 they are pretty closely aligned.

    The 1971 cool spike is massaged colder at Alip by about 0.3 degrees and up at Dum by about 0.1 degrees.

    The 1959 warm spike at Alip is massaged down by 0.5 degrees, which is about 0.1 degrees hotter than the truncated Dum data showed. This also makes it 0.3 degrees cooler than 1979 instead of 0.2 degrees warmer.

    The overall effect is to eliminate the cooling period from 1960-1975 which gives a rise-fall-rise pattern in the long-lived Alip record, and create a straight line trend up from 1920 to current time.

    Are they massaging to correlate with CO2 build-up?

    Looking at your list of stations dropped in your previous post, I noticed one called Calhoun Research Center and looked it up. I wanted to see if the research center had closed, as it seemed the kind of station likely to generate more reliable data, given the holes in some of the data collection. Calhoun is part of Louisana State University, and very much on-going. I emailed them to find out if they knew why they were no-longer included in GHCN and got a prompt response:

    “We recently noticed the data has been missing out of the GHCN data set as well. The site is still taking daily temperature and rainfall data and it is archived.
    We just have to get NCDC to get the data into the additional data set. This is not an isolated case. There apparently were dozens of sites across the US that were dropped inadvertently from the GHCN data set. I have supplied the required paperwork to NCDC to get it back into the data set as of just a few days ago.”

    Inadvertently?

  4. E.M.Smith says:

    Peter Dunford

    Your Alip example is the kind of thing that is causing me to let go of “Hanon’s Razor” as the total bias is no longer covered by “stupidity”…

    “We recently noticed the data has been missing out of the GHCN data set as well. The site is still taking daily temperature and rainfall data and it is archived.
    We just have to get NCDC to get the data into the additional data set. This is not an isolated case. There apparently were dozens of sites across the US that were dropped inadvertently from the GHCN data set. I have supplied the required paperwork to NCDC to get it back into the data set as of just a few days ago.”

    Inadvertently?

    Well, my measure puts it at about 800 stations just from the “best station listing”, but I guess you could turn that into “dozens”…

    And isn’t it nice to know that $Trillions of dollars and the fate of the world economic system hinges on “inadvertent” acts?

    Hey, roll them dice; it’s only the world…

    So here’s an idea, we take a list of all the deleted stations, track down each one; if it is still live, we document it and have them, too, “supply the required paperwork”. Would be a nice project for the http://www.surfacestations.org/ team.

    Oh, and we notify the news…

  5. j ferguson says:

    E.M.

    Is it possible that these sites were collectively pinged and did not respond appropriately? See reference to “paperwork” at Louisiana.

    Perhaps some well meaning but misguided soul sent out an enquiry without a SASE asking for a response by a particular date and not getting one from some stations simply dropped them out of the system never guessing that in so doing, expectations for major global catastrophe would be much increased.

    A clerical exercise turns into forecasts of the end of the world as we know it.

    Your work is getting to me. I’m beginning to think I want to find proximity and record length of stations still recognized to those sites deleted.

  6. j ferguson says:

    More to the point:

    “By God, if they won’t send our forms back to us, Scr*w em, We’ll just drop them from our data service.”

  7. Tim Clark says:

    To state the obvious, they had to get rid of that pesky 1958 value. But is any data from this station current?

  8. Tim Clark says:

    Oh, and E.M., I’ve read through some comments but will repeat the thesis of some of them. You need to publish. Don’t worry about finishing or having it all done. Select a very tiny important piece, say the island nature of the ocean, or even this one. It’s what “publish or perish” scientists do, so it is accepted by the establishment. Get McIntyre to be the statitician and one of the Pelke’s for credo. Or Easterbrook from Washington, or Lindzen. Any, or all of them would jump at the chance. Once you’re entirely convinced yourself that what you are seeing is robust ;~P, then send it to them. I can’t help you in panache, my degrees are in soils and physiology. But I would help edit or lit search, as I’ve taken tech jounalism (this off the cuff writing is not so good!). When I read what is happening in Copenhagen and see what you’ve discovered here, well…….I wish my children well.
    It would almost be the most significant pub in history ( not the Magna Carta but you get the point). Gotta go, will check back tonight.

  9. E.M.Smith says:

    @j ferguson

    The “why” is totally speculative so far. It could be anything from malice to someone typing an O instead of an 0 in some screening program. That it is wrong is not in doubt.

    But there comes a point where you have 2 dozen “accidental wrongs” and not a single “accidental right”… and you simply must suspect and adjust the investigation accordingly.

    How many times in a row can the same person arrive at the check out counter with a $20 price tag moved onto a $50 pair of shoes, and never the other way, before you start to raise the investigative level?

    @Tim Clark

    I fear you are right. I’ve been trying to keep the number of “new projects” down, but this latest week has, I think, “raised the value bar” on this one… But I think I fit best as the 2nd or 3rd guy on the author list; the technogeek who does the leg work for the glory guy.

    Per the 1958: Notice that these are two records from the exact same “Station”. Only the “modification flag” is different. That last digit in the “id” in the link. “0” for the first one, “1” for the second one. I don’t know exactly what the difference is in the “modification history”, it could be as simple as a Time of Observation change or as complex as taking out the Stevenson Screen and moving to an electronic thingy.

    In theory, the proper thing to do is to validate the calibration on the splice, and splice these two together. GIStemp tries to do that with “rules” like “if one is shorter than 20 years, toss it” that are just not valid rules. Better would be to compare the nature of the change at each site and adjust the record accordingly (what I think the GHCN “adjusted” data set tries to do). IMHO, better than the GIStemp methodology would be to just “splice the damn thing together and move on”. After all, if this were a station 100 km away, they would say it is OK to just average it into the grid/box.

    I would assert that on the face of it, a minor modification history difference will be less than the microclimate differences inside a 1 degree LAT by 1 degee LONG box (where you end up) so it makes no sense to put this level of hurdle in front of a valid thermometer record You could easily have an old thermometer 100 km away being averaged with a new one here, so why exclude the same type of thermometer at a different point in time?

    Inspection of the logs shows that 200+ records are in this category. The same “station” is both kept, and not kept, and that means tossing a fragment of its history.

  10. Ellie in Belfast says:

    @Tim Clark: I agree. Wholeheartedly.

    @EM Smith: don’t underestimate the importance of what you are doing. If a paper is written you are the most important author. It is your work, even if it is someone else’s fancy introduction, discussion of the literature and conclusions. You need to get it into the literature. I know you don’t care about people tossing rocks (and they will, regardless) but if it is published under peer review, the tossers can’t undermine your credibility (“only a blogger”) in the eyes of the press.

    It also needs to be written out in a series of short eyecatching pieces, with simple but memorable graphics that the lay public will appreciate and remember and that the press are likely to pick up.

    The second may be more urgent, and if carefully done would perhaps not compormise the first.

    Incidentally, what you write here is brilliant – understandable, funny, well written etc, but it requires a high degree of familiarity with the subject and the casual reader may not stay with it.

  11. E.M.Smith says:

    @Ellie

    There are parts “I don’t quite get”, like the notion that if a truth is known on a blog it can not then be published in a peer reviewed journal since it is already “out”. I “get it” in an abstract way; but it is “wrong”, and wrong things don’t stick well in my brain… Things like “truth only comes with peer review”. Again, in an abstract way I understand that some folks only trust Authority, but at my core of cores I don’t give a fig about authority. There is right and there is wrong, there is truth and there is not truth; and as often as not Authority is on the wrong side of truth, rightness, and understanding. And that does not in any way diminish the real truth and rightness of a thing.

    So what I would “expect” is that a truth could kick around in a non-peer way, and then be packaged up as a formal article. But that is not how it works. And my nature is to avoid those things where my “instinct” is wrong. And that is why I’m saying 2nd or 3rd. Not some “status” thing; my own limitations. I know that in the “peer reviewed publishing” world, I’m a novice. The guy who just got off the boat and asks the nice gent on the dock where he can get $100 bill changed and what does it turn into in local money? “Oh, you can do it for me? Great!”…

    So I’m just recognizing that I need a guide. And guides get paid. In status, or royalties, or whatever; they get their cut for knowing the ropes. (Why inventors sign away patent rights to the guy who can turn it into money.)

    With that said, I am very open to the notion of publishing. I just don’t “know the ropes” enough to point myself in the right direction. Like the grad student who is “2nd” on a paper by his advisor showcasing his work. Heck, I’d love to be lead author; but I’m not going to make that an issue (and I suspect that impact would be greater with a “name” lead). And impact is more important to me than ‘status’. And a good lead would be able to give guidance like “The islands things, do that. Code review, not so much…” or “They need 3 pages, not 12, and you need a mathematically rigorous measurement of warming in the final data pre and post change.” Or not…

    Thanks for the endorsement of my “style”. I owe it all the the Gift ‘O Gab from my Irish side ;-) coupled with the English love of the language. One of my earliest memories is of my mother telling me about the language inside the language. About latin and greek roots, prefixes, suffixes, and how a word could tell a story inside of itself… “posthaste” beyond haste, “predate” before the date… and the way words could play one with another… Very pun-ny at times 8-)

    I have deliberately stayed focused on a small audience for the material here. Folks from WUWT primarily. This actually started as a place where I could just put things I’d written 10 times and just wanted a link to the same ‘ol same ‘ol. Somewhere along the line I got frustrated that the deconstruction of GIStemp was not happening and someone ought to do somethings… and realized I was “someone” with all the needed skills.

    So between those two things, yes, it’s a technically focused narrow audience style for the GIStemp stuff. I can, and have, done the “broad audience style”. (See the “no shortage” postings, for example.) I often had to ‘earn my chops’ with technical folks who thought I couldn’t have the tech level they had due to my speaking plainly to management in a ‘no jargon’ way. It was fun to show them I was a ‘sleeper’… But this is the flip side. What it all comes down to is “what audience”, then you translate the thought accordingly.

    I’d actually hoped to gather a few “programmer types” by having a more “programmer style” with more of the code and jargon in the postings; and share out the work, but that didn’t happen. So I’m slowly drifting toward less of a technobabble ‘style’ here. But my main goal remains finding out the truth about what GIStemp does. And that will be a narrow audience and likely require some technical understanding for the “in depth” articles. (Things like “Islands in the Sun” are much more approachable. Guts of zones, well…)

    At this point, I’ve rambled on a bit and the family is looking at me asking what that wonderful smell is from the oven and when I’m going to give them dinner, so I think I need to be AFK for a while. (It’s slow rubbed BBQ with my own spice mix on it; been in the oven for hours on Real Slow… 8-)

  12. Rick Beikoff says:

    Amazing work!

    You need to get published so you can give credible evidence in the up-coming federal court action by the Chamber of Commerce against the EPA’s intention to bring down an endangerment finding against CO2 – essentially AGW on trial. They will win this and then it will all be over.

    You should contact Bill Kovacs at the CoC. I’m sure they will help you with resourses and money. His position is described in the following article:

    http://www.thenewamerican.com/index.php/tech-mainmenu-30/environment/1781

  13. Ellie in Belfast says:

    I’ve now had time to give this some more thought.

    Bishop Hill has a relevant post:

    http://bishophill.squarespace.com/blog/2009/10/2/peer-review.html

    A comment from “DocBud” says

    Reviewers are asked to assess whether or not the content is worthy of publication, primarily in terms of adding something new to the field, and to comment on the clarity and presentation of the arguments.

    and later

    Peer review has attained an unwarranted status in the eyes of some who, presumably, are not able to assess the merits of a paper themselves and, therefore, use peer review as a guarantor of the correctness of the content of said paper.

    When I read a paper in my field, I do not ask if it has been peer reviewed, I critically read it and come to my own conclusions as to its worth. The starting point should be that the paper is wrong and then allow the authors to attempt to persuade you otherwise.

    A friend made a very valid point that the process of writing a paper for submission to a journal is one of rigour and self-criticism. It can be tedious to write down every detail of what you did (and why) and it is easy to leave out important details you take for granted, but which are not obvious to anyone to whom your work is new.

    But academic papers are dry and cold; often it takes a lot of staying power to read them to their conclusion and the (academic) community so abhors hyperbole that conclusions are often understated anyway. They move their field forward incrementally in tiny steps, and are revered only by those who need to create some value in them to reinforce their own notions of ‘being something special’.

    You don’t need that – and you have pitched this site perfectly.

    I have also been immensely frustrated by the whole AGW issue since I started to understand the science. From a place of great ignorance, I have learned more than I care to quantify. Unlike you I am not “someone” with all the needed skills. All we here can do is assist in some small way, and learn a lot in the process.

Comments are closed.