GHCN – Does “homogenized” mean cooked?

Central Park Raw vs GISS

Central Park Raw vs GISS

Original Image

Over on Icecap ( http://icecap.us/ ) for January 12th, Joseph D’Aleo has a very interesting article looking at Central Park. I’m going to focus in on just one little graph from his posting, the one above.

UPDATE: (18 Jan 2010) In talking with Joseph, he has confirmed using GISS data, not the actual “unadjusted” data from NOAA / NCDC. I’ve sent him the “unadjusted” GHCN data and he reports that it is not significantly divergent from the actual “raw” data he got from NY directly. This article as been re-edited to reflect that change in the interpretation of the graph.)

Joseph did the leg work to find the real raw data and compare it to the NOAA / NCDC GHCN “unadjusted” data as merged with the NOAA / NCDC USHCN “corrected” data in GIStemp. What he finds are that the “unadjusted + corrected” data are very much adjusted (and some of us would say very much “maladjusted” ;-) as it flows part way though GIStemp.

Exactly which step is used is something I’m still looking into. It is at least the “as combined” step and perhaps the “homogenized” step that includes an Urban Heat Island Effect correction. The original version of this posting called it the “unadjusted” data based on the name used on the chart (The GISS web site lables it “Raw” on the dropdown menu first selection).

The data he used is not the GHCN Unadjusted data directly, but the data set used is the result of the processing of GIStemp. (The link in the paper at Icecap connects to the GISS web site, not to NOAA / NCDC. The option to download the STEP0 data is labled “Raw GHCN + USHCN corrections” at GISS). If that was, in fact, the data set used; then the graph will reflect the merger process in GIStemp STEP0.

That process looks for the existence of both sets of data (GHCN “unadjusted” and USHCN – version one prior to November 15th 2009, and version 2 with added “adjustments” thereafter). If only one exists, that one is used. If both exist, then they are averaged, in an odd sort of way. To the extent the heading on this graph ought to have been “GHCN Unadjusted AND USHCN” there will be some USHCN derived adjustments making up part of that “unadjusted” line. To the extent that the “as combined” data were used, the chart does not change much (it is mostly an ‘in fill’ process). And to the extent that the “homogenized” data were used, then this chart shows what the “homogenization” process does to the data. (And potentially, for all cases, what “adjustements” are in the USHCN version 2 set.)

In either case, that 3 F increase in the warming trend is present at some point in the GIStemp process of STEP0 to STEP2 (step zero is the very first ‘glue data sets together’ part of GIStemp. GIStemp is the program from NASA / GISS that creates those “anomaly maps” showing we have warmed by some fractions of a degree and so, ought to panic). One way or another, that 3 F warming trend “bump” is in the making of that product…

Just look at that. Up to 3 whole degrees of F (over 3 in a couple of places) of added “warming trend” via the NOAA / NCDC “corrected” adjustments and GIStemp processing. Heck, even the language you must use to describe what is going on is painful to the ear. But what else to call it? The lables NCDC applies are “unadjusted” and “corrected” so that has to be used to know which data sets I’m talking about. The data are clearly changed, so it is adjusted. And we are left with lumpy terms in quotations like – “Unadjusted + corrected” data and “raw + corrected” that isn’t raw and is adjusted; and “homogenized” that is both truncated and UHI corrected as well.

You can take nothing for granted when reading the NAME of a data set used in climate “research”.

In the first version of this posting, I had written:

It looks to me like we will need to go all the way back to “first sources” to have any hope of finding out what is really going on in the temperature history of the planet. GHCN “Unadjusted” clearly is too adjusted to be suitable to the task.

Given the new comparision of NOAA / NCDC GHCN “unadjusted” without the GISTemp processing to the actual raw data from NY, I now say instead:

It looks to me like we will need to go ahead of GIStemp and use GHCN “unadjusted” to find out what is going on in the temperature history of the planet. We don’t need to go all the way back to “first sources” to have any hope of finding out what is really going on in the temperature history of the planet, but ought to do so for some QA checks along the way. USHCN Version 2 is too “corrected” and GIStemp “homogenized” is clearly is too adjusted to be suitable to the task.

This is very good news to me, since it means that all the analysis I’ve done using GHCN “unadjusted” is using valid data and I don’t have to do it all over again! Be advised that NOAA / NCDC is rumored to be making a new version of GHCN that uses the same adjustment method as USHCN Version 2; so we may be right back at this “know your adjustments” issue again in a month. One hopes they continue to make available an “unadjusted” version…

I will be digging through the varous “versions” of the data made available by NOAA / NCDC (GHCN – the Global Historical Climate Network both “unadjusted” and “adjusted”; USHCN – the U. S. Historical Climate Network both “version one” and “version two”. USHCN claims to be “corrected” but it is unclear which “corrections” are adjustments… some documents describe the USHCN as unadjusted, but corrected.) and the data from GISS in GIStemp (both STEP0 “Raw GHCN + USHCN corrected” and the newer version that uses USHCN Version2 that is known to have some adjustments in it. GIStemp only began using USHCN V2 a couple of months ago.) I’ll make a follow up posting here when I’ve got something to show.

If all this talk about 4 different versions of the same data for the same location (Central Park) has your head swimming, just think on this: They are all held out as valid and correct by NOAA / NCDC. The same organization produces all of: GHCN “unadjusted”, GHCN “adjusted”, USHCN “corrected”, and USHCN Version2. They all are available for download now.

GISS, from GISTemp, makes available a further 3 variations plus anomaly maps. Taking the NOAA / NCDC data and reworking it into yet more variations.

So exactly what “input data” are the right ones? You get to chose based on what ‘adjustments’ and ‘corrections’ you would like to have. And they are different from each other, often by several degrees. From this we are supposed to be excited about fractional degrees of change? There is much more than that in the adjustments…

About these ads

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific, NCDC - GHCN Issues and tagged , , . Bookmark the permalink.

36 Responses to GHCN – Does “homogenized” mean cooked?

  1. Tony Hansen says:

    Perhaps it was de-unadjusted.
    With an inverse UHI pre-non-post adjustment.

  2. E.M.Smith says:

    @Tony: Don’t DO that! Now I have to go make a fresh cup of tea ;-) and find a towel…

  3. boballab says:

    It looks to me like we will need to go all the way back to “first sources” to have any hope of finding out what is really going on in the temperature history of the planet. GHCN “Unadjusted” clearly is too adjusted to be suitable to the task.

    Ouch. For US stations I have been working on the State College PA. station from the old Paper copies on file. Trust me its slow going and after awhile your eyes want to bleed.
    Nothing like a maladjusted scanner that makes a very dark scan of an old paper copy and then it gets thrown into a PDF file.

    http://www1.ncdc.noaa.gov/pub/orders/D749BC24-D175-9C48-0682-4C1565A93356.PDF

    That link will give ya an idea

  4. Tonyb says:

    Hi EM

    I did a guest post over at Air Vent on this very subject a few months ago.

    http://noconsensus.wordpress.com/2009/11/25/triplets-on-the-hudson-river/#comments

    Tonyb

  5. genezeien says:

    Er.. um.. I posted a graph of NYC area stations a while back, looking at UHI. I’d started from the GHCN “raw” data and the plot for Central Park (aka USC00305801) looks a lot like the plot of Joseph’s “raw”. Granted, my plot is in Celsius, but 12C ~ 53.6F http://justdata.files.wordpress.com/2009/12/nyc_aat_qc.jpg Please note, the plot starts in 1900 a bit above 12C. The GHCN data I’m using is from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd_all.tar.gz which unpacks into a lot of fixed-width text files.

    I’ll drop by IceCap to see which GHCN data set he’s using.

  6. Harold Vance says:

    I would like to see only the difference graphed starting with the three degrees difference and going forward.

  7. genezeien says:

    Yep, he used http://data.giss.nasa.gov/gistemp/station_data/ That website does not state which version of the data is being presented, although it does have “GISS Surface Temperature Analysis” on the page before the one offering a text version of the data.

  8. papertiger says:

    NCDC claims that a weather station in central park was artificially warmed by 3 degrees in 1900 by it’s urban location, but isn’t effected today.

    How in that supposed to work?
    Over at Watts they showed Antarctica stations warmed from being buried in snowdrifts. Doesn’t apply here.

    There is no explaination for this. Probably why NCDC is keeping it a super secret adjustment which must not be named.

  9. Jeff Alberts says:

    but isn’t effected today.

    That would be “affected”, actually. ;)

  10. papertiger says:

    Jeff. Could you expand on your point? The fine details of the proper use of “effect” and “affect” excape me, and it’s bound to come up again. ;)

    Any controversy between “cooked” and “crooked”?

  11. Chuckles says:

    It’s been clear for some time that ‘unadjusted’ means ‘Whatever came in the front door, as it came in the front door’

    E.M. and others have done a superb job investigating the processing and corrections being done to the data for it’s final presentation, but I keep remembering past warnings and observations-

    When looking at census and similar govt. statistical data, remember that it was compiled by a grey haired lady in a village post office, who simply wrote down what she thought ‘head office’ wanted it to say.

    Measure with a micrometer, mark with chalk, and cut with an axe.

    The accuracy of the maximum-minimum temperature system (MMTS) is +/- 0.5 degrees C, and the temperature is displayed to the nearest 0.1 degree F. The observer records the values to the nearest whole degree F.

    Too many people are way too certain of too many things.

  12. Jeff Alberts says:

    An effect can have an affect on something else. One can effect change, or one can have an affect on someone else. I’d suggest you look up both words in that amazing invention called a dikshunaree. I think there’s even one on that other amazing invention called the interwebztube.

  13. Pingback: Even “unadjusted” is adjusted! at Heliogenic Climate Change

  14. rbateman says:

    I’m off once again to drudge through the newspapers to gather the reported data of the times for my locale.
    So far, I concur with the above graph: They engineered a slope by jacking down the lower end.

  15. Steve McIntyre says:

    I did a post on Central Park in 2007 with some interesting results:

    http://climateaudit.org/2007/07/05/central-park-will-the-real-slim-shady-please-stand-up/

    REPLY: [ Thanks for the link! Very nice write up that also finds 3 degrees of variation (though 3 C this time) between variations in the different adjustment series... Well worth a read. -E.M.Smith ]

  16. Steve McIntyre says:

    Note that the version discussed in my post (the one discussed by D’Aleo back then) was a different version again (the one used in CLIMVIS) – the provenance of which was not easy.

    REPLY: [ Keeping straight the provenance of things is one of the most daunting parts of this whole thing. Data sets mutate, release levels change, code updates and methods change. Next month GHCN is supposed to get a new "adjustment" method, so who knows what happens to all the past of the "old" GHCN. Etc. I met the past upon the stair, I met a past that wasn't there, it wasn't there again today, I wish to God it'd go away... (with apologies to real poets everywhere ;-) but it really does seem like that some times. So, with the GIStemp "update" of November 15, 2009 everything in GIStemp changed the past, again. Next month (if that is when GHCN updates) we will have a whole new past yet again. Who knows what Spring will bring... -E.M.Smith ]

  17. Josh Cryer says:

    Guys D’Aleo’s “raw” data set from Central Park matches homogenized GHCN / GISS data, so this “cooking” of data is actually someone completely messing up the analysis. I explain it here: http://www.talk-polywell.org/bb/viewtopic.php?p=32329#32329

    REPLY: [ I'll take a look at the various data sources that D'Aleo might have used and see if I can figure out which ones are on his chart. While I doubt that it is the "homogenized" data, it is always possible that the wrong data were downloaded. I'll also drop a note to D'Aleo and ask him to QA check his work. So, if we are lucky, this is not evidence of "cooking the raw data" only of "cooking the homgenized data". I, for one, would be most happy to find out the "raw" data were still usable; and to have a pointer at that part of the process that puts in that 3 F of warming slope. -E.M.Smith ]

  18. M. Simon says:

    Chief,

    Josh Cryer (a warmist) claims to have found an error in your method.

    Look here:

    http://www.talk-polywell.org/bb/viewtopic.php?p=32329#32329

    I am not deep enough into the methodology to know if he is correct. It looks sound. Or at least interesting.

  19. M. Simon says:

    My original take on the subject based on the labeling of the graph:

    http://www.talk-polywell.org/bb/viewtopic.php?p=32311#32311

    My current take:

    http://www.talk-polywell.org/bb/viewtopic.php?p=32345#32345

    [commentary and quotes snipped]

    So my take at this point: the graph is mislabeled. It still proves Chief’s point if not mine.

  20. Josh Cryer says:

    E.M.Smith, he claims this link is raw data: http://www.erh.noaa.gov/okx/climate/records/monthannualtemp.html

    He makes the claim on page 4 of this: http://icecap.us/images/uploads/CENTRAL_PARK.pdf

    This “raw” data matches perfectly with station 725030010 “after homogeneity adjustment”: http://data.giss.nasa.gov/gistemp/station_data/

    It is not raw data. He is comparing homogenized data with raw (but corrected) data.

    I am not going to argue about whether or not the homogenization process is correct, as while I believe it is sound, I have had little luck convincing people that the methods are sound. But here is a case of the wrong data being used, most certainly. Without an inkling of a doubt.

    D’Aleio’s “Central Park raw annual mean data” is not raw, it is GHCN / GISS homogenized.

  21. Josh Cryer says:

    I take it chiefio hasn’t felt the need to retract this allegation?

    REPLY: [ That is correct. I have pointed at a graph from Joe, and said it points out issues. Questions were raised, so I've sent off query to Joe. I'm awaiting his input. In the mean time, I've stated what folks think is the issue (mislabled graph, in that while GISS calls it "Raw GHCN + USHCN corrected" {note: NO "adjusted"} there are some "changes" in USHCN that some of us would call "adjustments") and while we wait for that input from Joe, I've changed the posting to reflect what this graph would mean if it were a mix of the GHCN unadjusted (as NCDC calls it, or "raw" as GISS calls it) with the USHCN (of whatever provenance). That is, pointed out that the 3 F is still an 'issue' but would be assigned to "homogenizing" rather than 'adjusting'... FWIW USHCN Version 2 was incorporated into GIStemp about November 15, 2009. So there are only 2 months worth of time to hit that window. So I could indulge in rampant speculation, or I can just patiently await clarification from the creator of the graph. It being a weekend, this seems reasonable. (BTW, I typically don't demand response from folks over the weekend for the simple reason that some religions demand folks not do labor on their sabbath days. Basically, being demanding is very rude on weekends.) -E.M.Smith ]

  22. Josh Cryer says:

    E.M.Smith, you’re correct, it doesn’t say ‘adjusted,’ my apologies. I didn’t see your edit in the main post, btw, you should let people know of that sort of thing. :)

    USHCN comes in three flavors, raw, adjusted for TOD, adjusted for everything else (including UHI). I do not think that GISS would possibly use raw non-adjusted, because TOD can seriously destroy the temperature trend.

    So I take it that “USHCN corrected” is USHCN raw + TOD. I will know soon if that is the case.

  23. M. Simon says:

    Josh,

    Without the TOD adjustment you can still use the method of differences to get trends as long as you note the step changes and the reason.

    I think that method has more validity than TOD adjustments.

  24. Josh Cryer says:

    M. Simon, I believe they essentially do that with TOD. Daytime temperature curves can be derived from the temperature data itself, so adjusting for TOD is quite reasonable.

    More scary are station moves, and other station history stuff. mad_derek from Talk Polywell showed a station move with Darwin Zero (note: large): http://i45.tinypic.com/o50ox2.png

    But those are actually easier to explain because they produce larger anomalies. Hot box on hot roof moved to a cool grassy area.

  25. boballab says:

    Found a gridded Anomaly map from NOAA for the year JAN – DEC 2008 based on the 1961-1990 baseline. What is interesesting about it, is they didn’t infill. You can see it here:

    http://www.ncdc.noaa.gov/oa/climate/research/ghcn/ghcngrid.html

    When you look at it the first thing I noticed was: No readings from Greenland and, how sparse Canada is. After that it seems that Central Africa doesn’t exist and neither does the central part of South America. Central China is out and there is huge gaps in Russia.

  26. alantrer says:

    I’ve taken a similar look at CANTON 4 SE, NY (301185). That station and a couple in NYC have the longest and most complete records in NY (1895-present).

    I took an interest in that station because I am from that area. I know exactly where the station is and know the area very well. There are good pictures over at surfacestations.org.

    I can vouch for the fact this is a rural station. The area has changed very little in the period the station has been in existence. The general area is so rural and remote that a group of Amish, sick of being tourist attractions in Penn. decided to move there. Half moved back to Penn. because it was too rural there!

    The area has a large temperature range. I can remember days of -40F with no wind chill and a few summer days over 100F

    Anyway I paid my 70 bucks to NCDC to get what was claimed the raw data. It likely had some level of QC applied but I’m fairly satisfied it is basically the original recordings. I also retrieved the USHCNv2 raw and USHCNv2 adjusted. Doing a few spot checks shows the USHCNv2 raw is essentially the same values as the NCDC raw. (There are a few interesting differences but not important for this post).

    In plotting the annual means I find basically what others seem to be finding. The adjusted values are cooler in the early years than the raw values and the adjusted slightly warmer in the later years than the raw. A trend line of the adjusted shows a little more than 1.0C warming over the recorded period. A trend line of raw shows a warming of a little less than 0.5C.

    I’ve been meaning to put together a writeup somewhere with source data links and a procedure write up but wanted to understand the adjustment procedure first so I could postulate some meaningful conclusions. I’ve just started to slog through the referenced papers.

    If any one wishes to reproduce the results its fairly straight forward using USHCNv2 raw and adjusted.

  27. Pingback: NASA GISS shows 2009 as tied for 2nd warmest year on record - Page 5 - PriusChat Forums

  28. Josh Cryer says:

    alantrer, I’ve found the differences between NCDC and USHCN raw are due to how the USHCN rounds min-max values, they round up in all cases, which I’m told should make no substantive difference, but once I have the software written I will check. I was taught to “round to even” so that the rounding is symmetrical, and if it turns out to actually matter I will shoot off an email to the USHCN.

    You got ripped off with that raw data, though, you should have went to a community college library and asked for a guest account, or asked someone you know in a university to allow you to have VPN access. It’s free for .edu addresses.

    Hell you could have asked me for it.

    (Note, I don’t have a huge problem with having to pay, because the NCDC data is used by commercial sources, and it should save taxpayer money. It’s just that their free access is weird, they could just make you register with the site somehow, and log your IP so you don’t use the data commercially.)

    I’ve found that the adjusted values are slightly warmer, too, for rural stations, but for city stations those values are in fact much cooler. It all depends on which station you look at. And you have to consider Time of Observation biases, in which many stations went through a time change that affects how the temperatures are read out.

  29. alantrer says:

    Josh, thanks for the offer of the data. I made a stab at getting at the data via computers at my local library but was foiled by not having a .edu or .gov email address. For now though I think I will focus on understanding the process of producing USHCNv2 adjusted from the USHCNv2 raw. For that I don’t think I need any special access.

    I guess I don’t understand the need to round at all. When debating 0.1C changes over decades it would seem one would want to propagate precision as far as possible.

  30. Josh Cryer says:

    You round because you want to maintain a CSV data field of so many digits. I agree that rounding isn’t necessary, but when you are going over many many many months of data the effect is insubstantial. But you can see for yourself in Excel (if you formatted the data correctly) how they derive their raw value. I should have said, though, they don’t “always” round up, they “always round halves up.” ie, 10.05 goes to 10.1. You should always “round to even.” So, 12.25 = 12.2, and 12.35 = 13.4, it’s symmetrical. It was one thing I immediately noticed in the data.

    I too am going over USHCN raw to USHCN F52. This is going to take some time to deduce, and even much longer to produce the software to actually do it (based on the papers).

    You should get menne-etal2009.pdf, menne-etal2010.pdf, and menne-williams2009.pdf at the bare minimum (from the USHCN ftp). And of course all of the relevant citations / references (possibly even going back to USHCN v1). I may upload them to my site so others can get easy access to this material.

    BTW, I reproduced your rural station, it is as you said. The F52 adjustments do change the trendline significantly. Whether that is supported by the methodology, I won’t know until I understand more.

  31. Josh Cryer says:

    Erm, 12.35 = 12.4, ya’ll knew what I meant. :P

  32. M. Simon says:

    This is going to take some time to deduce, and even much longer to produce the software to actually do it (based on the papers).

    Why should you have to reverse engineer the method? In REAL science isn’t that supposed to be disclosed?

  33. Josh Cryer says:

    It is disclosed. It’s the parsing software (to go from NCDC to USHCN/GHCN) that isn’t. Anthony Watts doesn’t disclose his parsing software, either. I checked. I have to write it myself (though I may shoot him an email).

  34. Josh Cryer says:

    I’d like to point out that while D’Aleo made a good faith effort to correct his mistake, he is still failing to recognize that US stations (which Central Park is) use USHCN v2 F52, which is fully homogenized (though USHCN v2 homogenization is very subtle as the graph shows). So calling this data 1) GHCN and 2) unadjusted, is incorrect.

    He writes, “implying they start with GHCN ‘unadjusted’ before they work their own homogenization and other magical wonders.” No, they start with whatever dataset has the station record. Central Park is not in GHCN as the numbers for Central Park in the GHCN are randomized, and it fails the quality control check. (Why they did it this way is anyones guess, probably some quick hack to avoid having to write a separate parser.)

    And to make matters worse, the graph has GISS data plotted as “GISS GHCN before homogenization.” This would be true if the station in question was not in the USA.

    One need only plot USHCN F52 with Central Park from GISS and it is identical.

  35. Pingback: Reykjavik Iceland: Temperature Manipulation

Comments are closed.