Over the years I’ve had several postings with ‘source of the data’ link for one thing or another. After a while, you get tired of digging them up again and trying to remember what was in each one. So what I’m going to do here is pretty simple: Put up a set of site links (as sometimes the link to the specific data goes stale when they delete it and replace with a ‘new historical’ set of data-food-product…) along with links to the current detail data links, and a statement or two for some of them saying “what is there”. (Adjusted, un-adjusted, adjusted silently via “QA”, etc.)
I don’t expect that I have everything, nor that I’ve got a description for all of it, so I’ll be putting this posting up “medium rare” and adding to it for a few days. If you have other ideas or sources, please post a comment. I’ll collect the sources into the head posting over time.
FWIW, I’ve been a bit of a packrat over the years. I have a GHCN v1 copy (or two or three…) along with a few GHCN v2 copies. Sometimes I’d download just the ‘adjusted monthly average’, sometimes I’d grab more. It depended on what I was doing, how much disk was free, and what I was thinking at the time (such as “gee, NOAA would never delete the old data, I don’t need a copy of V1 Daily”…) So my personal collection is a bit eclectic. It is also scattered over 1/2 dozen computers on both sides of the country and a few dozen CD / DVD backup disk sets…. But, someday, I hope to collect it all into some kind of a valid “history of the changing history”. Once I have a decent format and layout, I’ll be looking for an archive site where I can put up a few gigabytes for anyone else to use.
Why keep old data copies? For postings / discovery like these comparing version 1 with version 3 data:
As it stands now, I’m pretty sure I’ve got the GHCN semi-raw daily data (if the description can be believed – it looks like QA flagged, but datum still in place) for some large set of stations along with several sets of USHCN. I don’t have much at all from Hadley, but will likely packrat that too. With SD cards at $1 / GB and DVDs for backups at about 20 ¢ / 4 GB or so, it just isn’t all that expensive to “make a set”. (The hard part for me has been keeping them organized and ‘near me’ ;-) So pointers to other datasets in need of protection / archiving would be appreciated, along with any old archival copies an individual might have that they would like to see a broader audience.
With all that said, here’s the “Draft Alpha Posting” on temperature data sets. Do remember, I’m actively updating this list over the next few days in dribs and drabs, so don’t expect it to be done; expect it to be a construction project.
NOAA / NCDC
The National Climate Data Center – supposed to be the great guardians of the data, and they do have a nice archive, but it is a bit slim on “original raw only” and on version control. More than some others though (like GISTemp that is ‘never the same way twice and no version history kept’). In software development there are dozens of “version control” bits of software that let you roll forward and back to any particular revision while storing things efficiently. (From CVS to RCS to GIT to…) It would seem that folks in “Climate Science” are unfamiliar with these tools, not even having a source code archive to display changes. Oh Well. It is what it is.
Top Page: http://www.ncdc.noaa.gov/cdo-web/
Lists several products and has a nice interface to the data.
Climate Data Online (CDO) provides free access to NCDC’s archive of historical weather and climate data in addition to station history information. These data include quality controlled daily, monthly, seasonal, and yearly measurements of temperature, precipitation, wind, and degree days as well as radar data and 30-year Climate Normals. Customers can also order most of these data as certified hard copies for legal use.
I note in passing the lack of a statement that the original source data ‘unadorned’ (i.e. raw) is also included. It looks as though the QA status is a flag on the daily items and that the reading is still there; but I’ve not proven that via actually looking at the data archive. (It’s a bit large ;-) though you can download individual years if desired and they are smaller). IFF that is true, this is a very good starting point. It would show the actual reading, the QA assessment, and you can decided. Then since it is Daily Min / Max you can calculate your own trends through either, or various daily, weekly, monthly, ‘whatever’, averages. It is what I think needs more exploration sooner. I’ve downloaded a copy of daily data ( many GB and many hours) but not yet unpacked it as I’d already filled up my disk with ‘other stuff’… So it will be a while before I can give it a look and assure it’s what I think it is (and described just above – yes ‘trust but verify’ applies to my own work too, and especially to my speculations).
One clicks on ‘datasets’ to get to the data, and that takes you here:
Climate Data Online
Nexrad Level II
Nexrad Level III
Precipitation 15 Minute
COOP Daily / Summary of Day
Global Climate Station Summaries
Global Hourly Data
Global Marine Data
Global Summary of the Day
National Solar Radiation Database
Quality Controlled Local Climatological Data
Regional Snowfall Index
Snow Monitoring Daily
Snow Monitoring Monthly
I found the “Daily Summaries” most useful:
File:COOPDaily_announcement_042011.doc 34 KB 4/20/2011 12:00:00 AM
File:COOPDaily_announcement_042011.pdf 123 KB 4/20/2011 12:00:00 AM
File:COOPDaily_announcement_042011.rtf 67 KB 4/20/2011 12:00:00 AM
all 7/20/2014 7:28:00 AM
by_year 7/19/2014 7:05:00 PM
figures 2/6/2013 12:00:00 AM
File:ghcnd-countries.txt 3 KB 9/20/2013 12:00:00 AM
File:ghcnd-inventory.txt 23798 KB 7/15/2014 8:40:00 AM
File:ghcnd-states.txt 2 KB 5/16/2011 12:00:00 AM
File:ghcnd-stations.txt 7709 KB 7/15/2014 8:40:00 AM
File:ghcnd-version.txt 1 KB 7/20/2014 8:27:00 AM
File:ghcnd_all.tar.gz 2521828 KB 7/20/2014 8:27:00 AM
File:ghcnd_gsn.tar.gz 100441 KB 7/20/2014 8:27:00 AM
File:ghcnd_hcn.tar.gz 278461 KB 7/20/2014 8:27:00 AM
grid 7/19/2014 8:34:00 PM
gsn 7/20/2014 5:34:00 AM
hcn 7/20/2014 5:35:00 AM
papers 10/2/2012 12:00:00 AM
File:readme.txt 23 KB 3/18/2014 5:02:00 PM
File:status.txt 28 KB 1/10/2014 12:00:00 AM
The ghcn_all.tar.gz is a ‘tar’ tape-archive gzip format file. Helps to know / use Linux or Unix to unpack and untar it. As you can see, it is large at 2.5 GB. It is larger once uncompressed. (That is why I’ve not unpacked and inspected it yet…) Under the by_year directory you can get any individual year as a smaller and easier to swallow chunk. Just looking at a listing of it lets you see just how little data there is in the early years compared to recent years. Long term trends are really strongly biased by the very few early readings. We can’t really get quality long term trends out of the recent 50 years data, since there are 60 ish year cycles in weather (climate-change).
File:1763.csv.gz 4 KB 7/19/2014 7:02:00 PM
File:1764.csv.gz 4 KB 7/19/2014 7:02:00 PM
File:1765.csv.gz 4 KB 7/19/2014 7:04:00 PM
File:1766.csv.gz 4 KB 7/19/2014 7:03:00 PM
File:1767.csv.gz 4 KB 7/19/2014 7:00:00 PM
File:1768.csv.gz 4 KB 7/19/2014 7:02:00 PM
File:1769.csv.gz 4 KB 7/19/2014 7:03:00 PM
File:1770.csv.gz 4 KB 7/19/2014 7:02:00 PM
File:1771.csv.gz 4 KB 7/19/2014 7:00:00 PM
File:1910.csv.gz 40216 KB 7/19/2014 7:05:00 PM
File:1911.csv.gz 41844 KB 7/19/2014 7:02:00 PM
File:1912.csv.gz 43604 KB 7/19/2014 7:02:00 PM
File:1913.csv.gz 44876 KB 7/19/2014 7:02:00 PM
File:1914.csv.gz 46453 KB 7/19/2014 7:00:00 PM
File:1915.csv.gz 47889 KB 7/19/2014 7:00:00 PM
File:1916.csv.gz 49533 KB 7/19/2014 7:05:00 PM
File:1917.csv.gz 49795 KB 7/19/2014 7:01:00 PM
File:1942.csv.gz 77718 KB 7/19/2014 7:01:00 PM
File:1943.csv.gz 78514 KB 7/19/2014 7:03:00 PM
File:1944.csv.gz 80328 KB 7/19/2014 7:03:00 PM
File:1945.csv.gz 82367 KB 7/19/2014 7:03:00 PM
File:1946.csv.gz 82934 KB 7/19/2014 7:02:00 PM
File:1947.csv.gz 84043 KB 7/19/2014 7:05:00 PM
File:1948.csv.gz 100697 KB 7/19/2014 7:03:00 PM
File:1949.csv.gz 114757 KB 7/19/2014 7:01:00 PM
File:1950.csv.gz 117901 KB 7/19/2014 7:01:00 PM
File:2008.csv.gz 179740 KB 7/19/2014 7:03:00 PM
File:2009.csv.gz 184554 KB 7/19/2014 7:03:00 PM
File:2010.csv.gz 186549 KB 7/19/2014 7:02:00 PM
File:2011.csv.gz 173750 KB 7/19/2014 7:04:00 PM
File:2012.csv.gz 169154 KB 7/19/2014 7:00:00 PM
File:2013.csv.gz 166776 KB 7/19/2014 7:03:00 PM
File:2014.csv.gz 83161 KB 7/19/2014 7:04:00 PM
The degree of ‘instrument change’ over time is just incredible. Trying to do global calorimetry with that is really rather silly. But that is what ‘climate scientists’ do…
The ReadMe file has the interesting notes:
GHCN-D is a dataset that contains daily observations over global land areas.
Like its monthly counterpart, GHCN-Daily is a composite of climate records from
numerous sources that were merged together and subjected to a common suite of quality
It is unclear without more digging just what ‘merging’ and ‘quality assurance’ has done to the readings, but a top text reading implies flagging, not replacement or deletion. A “Dig Here!” is to dig into the referenced links and papers to assure / deny that assessment.
This by_year directory contains an alternate form of the GHCN Daily dataset. In this
directory, the period of record station files are parsed into
yearly files that contain all available GHCN Daily station data for that year
plus a time of observation field (where available--primarily for U.S. Cooperative
Observers). The obsertation times for U.S. Cooperative Observer data
come from the station histories archived in NCDC's Multinetwork Metadata System (MMS).
The by_year files are updated daily to be in sync with updates to the GHCN Daily dataset.
Just why 1770 needs daily updating is unclear… but the above listing quote does show daily update changes…
There are also pointers to other links and other data archives that need some kind of cross check to figure out if they are different or the same, change over time or not, and just what IS the real historical data…
Further documentation details are provided in the text file ghcn-daily_format.rtf in this
Users may find data files located on our ftp server at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all/.
There is no observation time contained in period of record station files.
GHCN Daily data are currently available to ALL users at no charge.
All users will continue to have access to directories for ftp/ghcn/ and ftp3/3200 & 3210/ data at no charge.
For detailed information on this dataset visit the GHCN Daily web page at http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/
You would think there would be ONE historical set of original RAW data, but no… not found it yet… there looks to be many copies, all with slightly different processing, mixes, histories, updates, “quality assurance”, etc…
Down in the footer of the top page is a comment that they have a ‘legacy’ site with data sets not yet migrated to the new site. Probably worth some archival time…
Data Here: http://www7.ncdc.noaa.gov/CDO/cdo
There is a ‘data set’ option in a drop down on the right. Has some satellite as well as ground data. Could likely spend a month on that just sorting it out, and archiving the bits that are valuable. I’ve not explored it yet, but the implication is that this ‘older’ site / version is going to go away once brought up to date. No idea if that means “copy over intact” to the new site; or “convert and expunge the original past”.
They also have a USHCN set of data, but they describe it as a subset of the GHCN. More details here:
It comes in versions too… they have Version 2.5 up now.
Has the various versions listed (v1, v2, v2.5) as directories. Also has a ‘daily’ directory.
File:ushcn_01.tar 130275 KB 11/7/2002 12:00:00 AM
File:ushcn_98.tar 122824 KB 3/23/2000 12:00:00 AM
At a few hundred MB, not that large to snag a copy. Though one wonders what happened since 2002 as the newest date stamp.
V1 has only a ‘metadata’ directory and v2 has only ‘monthly’. No idea where to get the old v1 and v2 data now.
Hadley CRU Climate Research Unit
The English archival site. Infamously having said that they “lost the original data”, but that post processing “improved” versions were available, and besides, it was mostly like GHCN that was at NCDC. (And would that be GHCN version 1, 2, or 3? Hmmm??? As the data Langoliers are busy rewriting it…)
At any rate, they may have something of value; but I’d likely use the NCDC daily data instead. (Oh, NCDC also has a long list of ‘data food products’ that they have post processed in various ways. As, IMHO, those are NOT data but are processing products, I’ve not listed them here. From the same top link you can get to their processed, adjusted, homogenized, etc. stuff, I’ve left that out of this posting). For Hadley, since they lost the actual data, all you get is their post-processing data-food-products.
Top Link: http://hadobs.metoffice.com/
It claims that data are available.
Interesting graph of it with a Very Nice plunge at the end in the last decade or so…
Hadley graph of Central England Temperature
Gridded Monthly combined land / sea data-food-product: http://hadobs.metoffice.com/hadcrut4/
Has a data download link on the page, but mostly just looks like you can get the post-processed data-food-product as numbers instead of as scary pictures. Don’t see the point, really.
They may have something else interesting there, but I’ve not put the time in to find it. There is a link to “CRUTEM4″ that claims to be actual data (for some degree of ‘data’):
This page describes updates in CRUTEM4 version CRUTEM.184.108.40.206. Previous versions of CRUTEM4 can be found here. Data for CRUTEM.220.127.116.11 can be found here.
Additions to the CRUTEM4 archive in version CRUTEM.18.104.22.168
The changes listed below refer mainly to additions of mostly national collections of digitized and/or homogenized monthly station series. Several national meteorological agencies now produce/maintain significant subsets of climate series that are homogenized for the purposes of climate studies. In addition, data-rescue types of activities continue and this frequently involves the digitization of paper records which then become publicly available.
The principal subsets of station series processed and merged with CRUTEM (chronological order) are:
Norwegian – homogenized series
Australian (ACORN) – homogenized subset
Brazilian – non-homogenized
Australian remote islands – homogenized
Antarctic (greater) – some QC and infilling
St. Helena – some homogenization adjustment
Bolivian subset – non-homogenized
Southeast Asian Climate Assessment (SACA) – infilling /some new additions
German/Polish – a number of German and a few Polish series – non-homogenized
Ugandan – non-homogenized
USA (USHCNv2.5) – homogenized
Canada – homogenized
In addition, there have been some corrections of errors. These are mostly of a random nature and the corrections have generally been done by manual edits. For a listing of new source codes in use, see below (end).
Largely homogenized and fermented to make data-food-product… something vaguely cheesy, but not real. (In the USA various artificial cheese like products must be labeled ‘cheese food product’ so you will not confuse them with real cheese. I’ve adopted the phrase ‘data food product’ to similarly identify things that are vaguely data like, but not really source data…)
At any rate, it is unclear to me just why I want such a data food product from Hadley.
They claim older versions are here: http://hadobs.metoffice.com/crutem4/data/versions.html
They are all version 4. It is unclear to me what has happened to versions 1, 2, and 3 and where they might be found, though a quick search turned up this link for 3: http://www.metoffice.gov.uk/hadobs/crutem3/data/download.html
Similar searches on Crutem2 give:
So Steve MacIntyre has likely got a set saved somewhere. The article has links in it that currently give 404 not found errors, so the official Version 2 data sets look to have hit the bit bucket. (Maybe I can talk Steve into sending me a set of V2 to archive… or a link to an archive. It would be amusing to do 2 vs 3 vs 4 compares someday…)
Interesting to note that CRU have the v2 links still up. One hopes it is the actual data, but it is what it is. (I’ve downloaded the data, whatever it might be)
Data for Downloading
ERRATUM: before 21st May 2003, the NetCDF versions
erroneously used a time dimension with units "months
since (startyear)-1-1" that started from 1. It should
(and now does) start from 0.
gzipped NetCDF Last updated
crutem2.nc.gz 1.9mb 2006-01-18
crutem2v.nc.gz 1.8mb 2006-01-18
hadcrut2.nc.gz 3.5mb 2006-01-18
hadcrut2v.nc.gz 3.3mb 2006-01-18
absolute.nc.gz 40kb 1999-07-13
The CRU crew might have other stuff of interest, but frankly, I'm not very comfortable that it is accurate:
The one link for surface data I did follow did a circular run to Hadley / MetOffice / and on…
Someone else can explore all that further, if needed.
NASA GISS GISTemp
IMHO, not really a source of “data”. GISS (Goddard Institute of Space Studies) just takes in the NCDC data, munges it around a little, and calls it data. It isn’t. It takes in an already ‘adjusted’ data-food-product and further manipulates it according to a fixed algorithmic process that is a bit dodgy. They fill in missing bits from other stations up to 1200 km away (doing this three times in successive sections) so any given ‘data item’ might be a complete fabrication partially based on data up to 3600 km away. They also do a somewhat backward Urban Heat Island “correction” that doesn’t correct for urban heat. In the end, they are the data outlier in most results; but for some reason many folks like to look at their stuff.
Data Input From: GHCN, USHCN, and Antarctica. (Links to be added a bit later after I unpack the latest GIStemp code to see if it has changed). It merges and homogenizes this batch and then makes up missing data and prints the results. Not really useful for anything as far as I can see.
Top link: http://data.giss.nasa.gov/gistemp/
Includes links to the data-food-products that it produces.
Data Output Here:
Has a link to GHCN version 2 data on that page, and lets you see individual station data after GIStemp is done with it.
The Carbon Dioxide Information Analysis Center. Guess where their bias lays…
Little referenced, they have their own set of data archives. I’ll be wandering through them to see what I can find. Sometimes it has interesting stuff.
USHCN intro page: http://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.html
Home page: http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html
I squirreled away a copy of the daily-by-state data some time ago, but who knows where it is now.
http://cdiac.ornl.gov/ftp/ushcn_daily/ (by State)
http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ (in one wad for all).
There are likely other sources for USHCN daily, but I’ve not spent the time to track them down. If you know of any, put a link in comments and I’ll add it.
Berkeley Earth Surface Temperature.
Claims to be the best, but isn’t. Not particularly worst either. One of the developers claims to have used the method that skeptics wanted; but it isn’t the method I’ve seen asked for much. It has a slice and dice data splicer at the core of it. Since data splicing is considered a sin in many technical disciplines, how doing more of it, more finely, is a feature; well, that’s beyond me. They also cooked up their own way to store data with their own date format (reasonable since they needed to bring divergent data together) but in a way that is sort of painful to use and a bit of work just to understand. (For example, a date instead of being 30 July 2012 or 300712 or any other is a floating point number. X.YYY where the granularity of the part after the decimal will resolve it to a particular day. I’ll get an accurate description and put it here. Just realize that you can’t take the B.E.S.T. copy of GHCN data and do a straight difference against the other GHCN copy as it has been, um, ‘converted’ in format.
So B.E.S.T. takes in much of the same data as the others, chops, dices, and splices it up a lot. Does more homogenizing and infilling things, then claims it is “data”. Yet another data-food-product, IMHO.
They do have an online archive of their sources (that are largely the same as the above: GHCN / USHCN /… )
The Berkeley Earth Surface Temperature Study has created a preliminary merged data set by combining 1.6 billion temperature reports from 16 preexisting data archives. Whenever possible, we have used raw data rather than previously homogenized or edited data. After eliminating duplicate records, the current archive contains over 39,000 unique stations. This is roughly five times the 7,280 stations found in the Global Historical Climatology Network Monthly data set (GHCN-M) that has served as the focus of many climate studies. The GHCN-M is limited by strict requirements for record length, completeness, and the need for nearly complete reference intervals used to define baselines. We have developed new algorithms that reduce the need to impose these requirements (see methodology), and as such we have intentionally created a more expansive data set.
We performed a series of tests to identify dubious data and merge identical data coming from multiple archives. In general, our process was to flag dubious data rather than simply eliminating it. Flagged values were generally excluded from further analysis, but their content is preserved for future consideration.
So far so good. Start from raw (though some is stated as slightly cooked…) and then combine and clean. It’s then the splice and dice homogenize that gets them, IMHO. “Methodology”.
Data Here: http://berkeleyearth.org/data
Breakpoint Adjusted Monthly Station data
During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred. This breakpoint-adjusted data set provides a collection of adjusted, homogeneous station data that is recommended for users who want to avoid heterogeneities in station temperature data.
What part of “thou shalt not splice data and have no error” is unclear to them? Sigh.
So it’s just a much more homogenized and much more sliced, diced, and spliced data-food-product.
But the good thing is that they put their source data on line, so you can back up ahead of their processing and start over:
There are many individual links there that I’ve not fully explored. Some of them are already above. Some are not (like the Colonial Era data and the Coop stations). I’d like to archive the lot of them, but time does not allow that at the moment. Perhaps folks could split the job up and each grab a chunk? Assemble again later?
Wood For Trees
Need to put in a good description of these folks. They have nice graphing facilities, and have the data sets behind it. Not yet found out if you can download the whole set of data direct from them.
Top Link: http://woodfortrees.org/
Lists their data sources on the side bar. Does include the UAH satellite data. (At some point I’ll add links to the satellite data, but since they don’t seem to have ‘revisions’ to their history quite so much I’ve not seen it as urgent).
Interactive graph here: http://woodfortrees.org/plot/
More a calculation site than an archive, yet they clearly have some kind of temperature data archive to be able to compute graphs for folks. Such as this example:
Looking at trends through it a couple of years back, it was clear they did no ‘clean up’ for things like large gaps and odd outlier data; so likely it is (or was) raw not QA checked data. (The data does need some kind of cleaning to be usable, but IMHO Hadley and NCDC go way too far).
Misc and Smaller Sites
There are a lot of ‘bits’ all over. I’ll be expanding this section “for a while”. From individual nations, to specific archives at schools and others. I don’t know much about them. It would be interesting to audit a few of these and compare them with the data-food-products above. If the little guys say “we recorded this data” and the above say “this is it!” and they are different, well… Some examples:
This site contains files of daily average temperatures for 157 U.S. and 167 international cities. The files are updated on a regular basis and contain data from January 1, 1995 to present.
Source data for this site are from the National Climatic Data Center. The data is available for research and non-commercial purposes only.
So it might be possible for some such sites to find older copies held by some other ‘packrat’.
Has a nice top page that says you can select various kinds of data:
If folks post enough links to various national sites, I’ll make a “by nation” section to collect them.
For now, I’m slowly working down the list of sites found by web searches like this one:
which also found:
DOWNLOAD a .csv spreadsheet file compatible with programs such as Excel, Access or the Free Open Office Calc, or view in your browser
Exclusive Station Finder Tool helps you find the best data for your needs
Complete National Weather Service archive – over 10,000 stations (far more than most resources on the web)
Unmatched data accuracy
Archived stations some daily data as far back as 1902, most hourly data back to 1972
Meteorologists on hand to assist
Instant access – Get the information you need immediately via your web browser with backup links sent via email
It looks to want to charge you for larger amounts, but might be free for individual station ‘samples’. They claim to have a lot of sources:
Weather Source meteorologists have created perhaps the world’s most comprehensive weather database by unifying multiple governmental and other weather databases together and applying advanced data quality control and correction methods. The resulting “super database” of weather information contains over 4 Billion rows of high quality weather observations. The Weather Warehouse provides users with direct and immediate access to this database. On the Weather Warehouse users have access to the following weather information:
I’ve not dug into it to figure out “what ultimate data source and what processing”, but it looks like all the Excel users our there can get a nice comma separated value CSV spreadsheet for easier processing for some stations.
OK, that’s it for the moment. I think the GHCN daily is likely the most unmolested of the lot. Coop data and CET are likely pretty useable too. USHCN daily is also likely clean of distortions. Any of the monthly average ‘data’ is not really data. It has been QA checked, filtered, selected, processed; potentially homogenized, adjusted and more. I’d rather start with daily data and work up from there.
If you know of any good archives of old musty data, please add a link!
Subscribe to feed