Data Sources – A List

Over the years I’ve had several postings with ‘source of the data’ link for one thing or another. After a while, you get tired of digging them up again and trying to remember what was in each one. So what I’m going to do here is pretty simple: Put up a set of site links (as sometimes the link to the specific data goes stale when they delete it and replace with a ‘new historical’ set of data-food-product…) along with links to the current detail data links, and a statement or two for some of them saying “what is there”. (Adjusted, un-adjusted, adjusted silently via “QA”, etc.)

I don’t expect that I have everything, nor that I’ve got a description for all of it, so I’ll be putting this posting up “medium rare” and adding to it for a few days. If you have other ideas or sources, please post a comment. I’ll collect the sources into the head posting over time.

FWIW, I’ve been a bit of a packrat over the years. I have a GHCN v1 copy (or two or three…) along with a few GHCN v2 copies. Sometimes I’d download just the ‘adjusted monthly average’, sometimes I’d grab more. It depended on what I was doing, how much disk was free, and what I was thinking at the time (such as “gee, NOAA would never delete the old data, I don’t need a copy of V1 Daily”…) So my personal collection is a bit eclectic. It is also scattered over 1/2 dozen computers on both sides of the country and a few dozen CD / DVD backup disk sets…. But, someday, I hope to collect it all into some kind of a valid “history of the changing history”. Once I have a decent format and layout, I’ll be looking for an archive site where I can put up a few gigabytes for anyone else to use.

Why keep old data copies? For postings / discovery like these comparing version 1 with version 3 data:

http://chiefio.wordpress.com/?s=GHCN+v1

As it stands now, I’m pretty sure I’ve got the GHCN semi-raw daily data (if the description can be believed – it looks like QA flagged, but datum still in place) for some large set of stations along with several sets of USHCN. I don’t have much at all from Hadley, but will likely packrat that too. With SD cards at $1 / GB and DVDs for backups at about 20 ¢ / 4 GB or so, it just isn’t all that expensive to “make a set”. (The hard part for me has been keeping them organized and ‘near me’ ;-) So pointers to other datasets in need of protection / archiving would be appreciated, along with any old archival copies an individual might have that they would like to see a broader audience.

With all that said, here’s the “Draft Alpha Posting” on temperature data sets. Do remember, I’m actively updating this list over the next few days in dribs and drabs, so don’t expect it to be done; expect it to be a construction project.

NOAA / NCDC

The National Climate Data Center – supposed to be the great guardians of the data, and they do have a nice archive, but it is a bit slim on “original raw only” and on version control. More than some others though (like GISTemp that is ‘never the same way twice and no version history kept’). In software development there are dozens of “version control” bits of software that let you roll forward and back to any particular revision while storing things efficiently. (From CVS to RCS to GIT to…) It would seem that folks in “Climate Science” are unfamiliar with these tools, not even having a source code archive to display changes. Oh Well. It is what it is.

Top Page: http://www.ncdc.noaa.gov/cdo-web/

Lists several products and has a nice interface to the data.

Climate Data Online (CDO) provides free access to NCDC’s archive of historical weather and climate data in addition to station history information. These data include quality controlled daily, monthly, seasonal, and yearly measurements of temperature, precipitation, wind, and degree days as well as radar data and 30-year Climate Normals. Customers can also order most of these data as certified hard copies for legal use.

I note in passing the lack of a statement that the original source data ‘unadorned’ (i.e. raw) is also included. It looks as though the QA status is a flag on the daily items and that the reading is still there; but I’ve not proven that via actually looking at the data archive. (It’s a bit large ;-) though you can download individual years if desired and they are smaller). IFF that is true, this is a very good starting point. It would show the actual reading, the QA assessment, and you can decided. Then since it is Daily Min / Max you can calculate your own trends through either, or various daily, weekly, monthly, ‘whatever’, averages. It is what I think needs more exploration sooner. I’ve downloaded a copy of daily data ( many GB and many hours) but not yet unpacked it as I’d already filled up my disk with ‘other stuff’… So it will be a while before I can give it a look and assure it’s what I think it is (and described just above – yes ‘trust but verify’ applies to my own work too, and especially to my speculations).

One clicks on ‘datasets’ to get to the data, and that takes you here:

http://www.ncdc.noaa.gov/cdo-web/datasets

Climate Data Online

Annual Summaries
Daily Summaries
Monthly Summaries
Nexrad Level II
Nexrad Level III
Normals Annual/Seasonal
Normals Daily
Normals Hourly
Normals Monthly
Precipitation 15 Minute
Precipitation Hourly

Legacy Applications

COOP Daily / Summary of Day
Climate Indices
Extremes Monthly
Global Climate Station Summaries
Global Hourly Data
Global Marine Data
Global Summary of the Day
National Solar Radiation Database
Quality Controlled Local Climatological Data
Regional Snowfall Index
Snow Monitoring Daily
Snow Monitoring Monthly

I found the “Daily Summaries” most useful:

ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/

File:COOPDaily_announcement_042011.doc 	34 KB 	4/20/2011 	12:00:00 AM
File:COOPDaily_announcement_042011.pdf 	123 KB 	4/20/2011 	12:00:00 AM
File:COOPDaily_announcement_042011.rtf 	67 KB 	4/20/2011 	12:00:00 AM
all 		7/20/2014 	7:28:00 AM
by_year 		7/19/2014 	7:05:00 PM
figures 		2/6/2013 	12:00:00 AM
File:ghcnd-countries.txt 	3 KB 	9/20/2013 	12:00:00 AM
File:ghcnd-inventory.txt 	23798 KB 	7/15/2014 	8:40:00 AM
File:ghcnd-states.txt 	2 KB 	5/16/2011 	12:00:00 AM
File:ghcnd-stations.txt 	7709 KB 	7/15/2014 	8:40:00 AM
File:ghcnd-version.txt 	1 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_all.tar.gz 	2521828 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_gsn.tar.gz 	100441 KB 	7/20/2014 	8:27:00 AM
File:ghcnd_hcn.tar.gz 	278461 KB 	7/20/2014 	8:27:00 AM
grid 		7/19/2014 	8:34:00 PM
gsn 		7/20/2014 	5:34:00 AM
hcn 		7/20/2014 	5:35:00 AM
papers 		10/2/2012 	12:00:00 AM
File:readme.txt 	23 KB 	3/18/2014 	5:02:00 PM
File:status.txt 	28 KB 	1/10/2014 	12:00:00 AM

The ghcn_all.tar.gz is a ‘tar’ tape-archive gzip format file. Helps to know / use Linux or Unix to unpack and untar it. As you can see, it is large at 2.5 GB. It is larger once uncompressed. (That is why I’ve not unpacked and inspected it yet…) Under the by_year directory you can get any individual year as a smaller and easier to swallow chunk. Just looking at a listing of it lets you see just how little data there is in the early years compared to recent years. Long term trends are really strongly biased by the very few early readings. We can’t really get quality long term trends out of the recent 50 years data, since there are 60 ish year cycles in weather (climate-change).

File:1763.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1764.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1765.csv.gz 	4 KB 	7/19/2014 	7:04:00 PM
File:1766.csv.gz 	4 KB 	7/19/2014 	7:03:00 PM
File:1767.csv.gz 	4 KB 	7/19/2014 	7:00:00 PM
File:1768.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1769.csv.gz 	4 KB 	7/19/2014 	7:03:00 PM
File:1770.csv.gz 	4 KB 	7/19/2014 	7:02:00 PM
File:1771.csv.gz 	4 KB 	7/19/2014 	7:00:00 PM
...
File:1910.csv.gz 	40216 KB 	7/19/2014 	7:05:00 PM
File:1911.csv.gz 	41844 KB 	7/19/2014 	7:02:00 PM
File:1912.csv.gz 	43604 KB 	7/19/2014 	7:02:00 PM
File:1913.csv.gz 	44876 KB 	7/19/2014 	7:02:00 PM
File:1914.csv.gz 	46453 KB 	7/19/2014 	7:00:00 PM
File:1915.csv.gz 	47889 KB 	7/19/2014 	7:00:00 PM
File:1916.csv.gz 	49533 KB 	7/19/2014 	7:05:00 PM
File:1917.csv.gz 	49795 KB 	7/19/2014 	7:01:00 PM
...
File:1942.csv.gz 	77718 KB 	7/19/2014 	7:01:00 PM
File:1943.csv.gz 	78514 KB 	7/19/2014 	7:03:00 PM
File:1944.csv.gz 	80328 KB 	7/19/2014 	7:03:00 PM
File:1945.csv.gz 	82367 KB 	7/19/2014 	7:03:00 PM
File:1946.csv.gz 	82934 KB 	7/19/2014 	7:02:00 PM
File:1947.csv.gz 	84043 KB 	7/19/2014 	7:05:00 PM
File:1948.csv.gz 	100697 KB 	7/19/2014 	7:03:00 PM
File:1949.csv.gz 	114757 KB 	7/19/2014 	7:01:00 PM
File:1950.csv.gz 	117901 KB 	7/19/2014 	7:01:00 PM
...
File:2008.csv.gz 	179740 KB 	7/19/2014 	7:03:00 PM
File:2009.csv.gz 	184554 KB 	7/19/2014 	7:03:00 PM
File:2010.csv.gz 	186549 KB 	7/19/2014 	7:02:00 PM
File:2011.csv.gz 	173750 KB 	7/19/2014 	7:04:00 PM
File:2012.csv.gz 	169154 KB 	7/19/2014 	7:00:00 PM
File:2013.csv.gz 	166776 KB 	7/19/2014 	7:03:00 PM
File:2014.csv.gz 	83161 KB 	7/19/2014 	7:04:00 PM

The degree of ‘instrument change’ over time is just incredible. Trying to do global calorimetry with that is really rather silly. But that is what ‘climate scientists’ do…

The ReadMe file has the interesting notes:

GHCN-D is a dataset that contains daily observations over global land areas. 
Like its monthly counterpart, GHCN-Daily is a composite of climate records from 
numerous sources that were merged together and subjected to a common suite of quality 
assurance reviews.

It is unclear without more digging just what ‘merging’ and ‘quality assurance’ has done to the readings, but a top text reading implies flagging, not replacement or deletion. A “Dig Here!” is to dig into the referenced links and papers to assure / deny that assessment.

This by_year directory contains an alternate form of the GHCN Daily dataset.  In this
directory, the period of record station files are parsed into  
yearly files that contain all available GHCN Daily station data for that year 
plus a time of observation field (where available--primarily for U.S. Cooperative 
Observers).  The obsertation times for U.S. Cooperative Observer data 
come from the station histories archived in NCDC's Multinetwork Metadata System (MMS).  
The by_year files are updated daily to be in sync with updates to the GHCN Daily dataset. 

Just why 1770 needs daily updating is unclear… but the above listing quote does show daily update changes…

There are also pointers to other links and other data archives that need some kind of cross check to figure out if they are different or the same, change over time or not, and just what IS the real historical data…

Further documentation details are provided in the text file ghcn-daily_format.rtf in this 
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/ directory.

Users may find data files located on our ftp server at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all/. 
NOTE: 
There is no observation time contained in period of record station files. 

GHCN Daily data are currently available to ALL users at no charge. 
All users will continue to have access to directories for ftp/ghcn/ and ftp3/3200 & 3210/ data at no charge.

For detailed information on this dataset visit the GHCN Daily web page at http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/

You would think there would be ONE historical set of original RAW data, but no… not found it yet… there looks to be many copies, all with slightly different processing, mixes, histories, updates, “quality assurance”, etc…

Down in the footer of the top page is a comment that they have a ‘legacy’ site with data sets not yet migrated to the new site. Probably worth some archival time…

Data Here: http://www7.ncdc.noaa.gov/CDO/cdo

There is a ‘data set’ option in a drop down on the right. Has some satellite as well as ground data. Could likely spend a month on that just sorting it out, and archiving the bits that are valuable. I’ve not explored it yet, but the implication is that this ‘older’ site / version is going to go away once brought up to date. No idea if that means “copy over intact” to the new site; or “convert and expunge the original past”.

They also have a USHCN set of data, but they describe it as a subset of the GHCN. More details here:

http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

It comes in versions too… they have Version 2.5 up now.

ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/

Has the various versions listed (v1, v2, v2.5) as directories. Also has a ‘daily’ directory.

File:ushcn_01.tar 	130275 KB 	11/7/2002 	12:00:00 AM
File:ushcn_98.tar 	122824 KB 	3/23/2000 	12:00:00 AM

At a few hundred MB, not that large to snag a copy. Though one wonders what happened since 2002 as the newest date stamp.

V1 has only a ‘metadata’ directory and v2 has only ‘monthly’. No idea where to get the old v1 and v2 data now.

Hadley CRU Climate Research Unit

The English archival site. Infamously having said that they “lost the original data”, but that post processing “improved” versions were available, and besides, it was mostly like GHCN that was at NCDC. (And would that be GHCN version 1, 2, or 3? Hmmm??? As the data Langoliers are busy rewriting it…)

At any rate, they may have something of value; but I’d likely use the NCDC daily data instead. (Oh, NCDC also has a long list of ‘data food products’ that they have post processed in various ways. As, IMHO, those are NOT data but are processing products, I’ve not listed them here. From the same top link you can get to their processed, adjusted, homogenized, etc. stuff, I’ve left that out of this posting). For Hadley, since they lost the actual data, all you get is their post-processing data-food-products.

Top Link: http://hadobs.metoffice.com/

It claims that data are available.

Data Here:

CET: http://hadobs.metoffice.com/hadcet/

Interesting graph of it with a Very Nice plunge at the end in the last decade or so…

Hadley graph of Central England Temperature

Hadley graph of Central England Temperature

Gridded Monthly combined land / sea data-food-product: http://hadobs.metoffice.com/hadcrut4/

Has a data download link on the page, but mostly just looks like you can get the post-processed data-food-product as numbers instead of as scary pictures. Don’t see the point, really.

They may have something else interesting there, but I’ve not put the time in to find it. There is a link to “CRUTEM4″ that claims to be actual data (for some degree of ‘data’):

http://hadobs.metoffice.com/crutem4/data/CRUTEM.4.2.0.0_release_notes.html

This page describes updates in CRUTEM4 version CRUTEM.4.2.0.0. Previous versions of CRUTEM4 can be found here. Data for CRUTEM.4.2.0.0 can be found here.
Additions to the CRUTEM4 archive in version CRUTEM.4.2.0.0

The changes listed below refer mainly to additions of mostly national collections of digitized and/or homogenized monthly station series. Several national meteorological agencies now produce/maintain significant subsets of climate series that are homogenized for the purposes of climate studies. In addition, data-rescue types of activities continue and this frequently involves the digitization of paper records which then become publicly available.

The principal subsets of station series processed and merged with CRUTEM (chronological order) are:

Norwegian – homogenized series
Australian (ACORN) – homogenized subset
Brazilian – non-homogenized
Australian remote islands – homogenized
Antarctic (greater) – some QC and infilling
St. Helena – some homogenization adjustment
Bolivian subset – non-homogenized
Southeast Asian Climate Assessment (SACA) – infilling /some new additions
German/Polish – a number of German and a few Polish series – non-homogenized
Ugandan – non-homogenized
USA (USHCNv2.5) – homogenized
Canada – homogenized

In addition, there have been some corrections of errors. These are mostly of a random nature and the corrections have generally been done by manual edits. For a listing of new source codes in use, see below (end).

Largely homogenized and fermented to make data-food-product… something vaguely cheesy, but not real. (In the USA various artificial cheese like products must be labeled ‘cheese food product’ so you will not confuse them with real cheese. I’ve adopted the phrase ‘data food product’ to similarly identify things that are vaguely data like, but not really source data…)

At any rate, it is unclear to me just why I want such a data food product from Hadley.

They claim older versions are here: http://hadobs.metoffice.com/crutem4/data/versions.html

They are all version 4. It is unclear to me what has happened to versions 1, 2, and 3 and where they might be found, though a quick search turned up this link for 3: http://www.metoffice.gov.uk/hadobs/crutem3/data/download.html

Similar searches on Crutem2 give:

http://climateaudit.org/2005/08/17/downloading-cru-data/

So Steve MacIntyre has likely got a set saved somewhere. The article has links in it that currently give 404 not found errors, so the official Version 2 data sets look to have hit the bit bucket. (Maybe I can talk Steve into sending me a set of V2 to archive… or a link to an archive. It would be amusing to do 2 vs 3 vs 4 compares someday…)

Interesting to note that CRU have the v2 links still up. One hopes it is the actual data, but it is what it is. (I’ve downloaded the data, whatever it might be)

http://www.cru.uea.ac.uk/cru/data/tem2/#datdow

Data for Downloading
ERRATUM: before 21st May 2003, the NetCDF versions 
erroneously used a time dimension with units "months 
since (startyear)-1-1" that started from 1. It should
 (and now does) start from 0.

Dataset	
gzipped ASCII	
zipped ASCII	
NetCDF	
gzipped NetCDF	Last updated

CRUTEM2 	
crutem2.dat.gz    3.2mb 	
crutem2.zip       3.2mb 	
crutem2.nc       19.2mb 	
crutem2.nc.gz     1.9mb 	2006-01-18

CRUTEM2v 	
crutem2v.dat.gz   2.5mb 
crutem2v.zip      2.5mb 	
crutem2v.nc       9.6mb 	
crutem2v.nc.gz    1.8mb 	2006-01-18

HadCRUT2 	
hadcrut2.dat.gz   4.5mb 	
hadcrut2.zip      4.5mb 	
hadcrut2.nc       9.3mb 	
hadcrut2.nc.gz    3.5mb 	2006-01-18

HadCRUT2v 	
hadcrut2v.dat.gz  4.2mb 	
hadcrut2v.zip     4.2mb 	
hadcrut2v.nc      8.4mb 	
hadcrut2v.nc.gz   3.3mb 	2006-01-18

Absolute 	
absolute.dat.gz    47kb 	
absolute.zip       47kb 	
absolute.nc        63kb 	
absolute.nc.gz     40kb 	1999-07-13

The CRU crew might have other stuff of interest, but frankly, I'm not very comfortable that it is accurate:

http://www.cru.uea.ac.uk/cru/data/temperature/

The one link for surface data I did follow did a circular run to Hadley / MetOffice / and on…

Someone else can explore all that further, if needed.

NASA GISS GISTemp

IMHO, not really a source of “data”. GISS (Goddard Institute of Space Studies) just takes in the NCDC data, munges it around a little, and calls it data. It isn’t. It takes in an already ‘adjusted’ data-food-product and further manipulates it according to a fixed algorithmic process that is a bit dodgy. They fill in missing bits from other stations up to 1200 km away (doing this three times in successive sections) so any given ‘data item’ might be a complete fabrication partially based on data up to 3600 km away. They also do a somewhat backward Urban Heat Island “correction” that doesn’t correct for urban heat. In the end, they are the data outlier in most results; but for some reason many folks like to look at their stuff.

Data Input From: GHCN, USHCN, and Antarctica. (Links to be added a bit later after I unpack the latest GIStemp code to see if it has changed). It merges and homogenizes this batch and then makes up missing data and prints the results. Not really useful for anything as far as I can see.

Top link: http://data.giss.nasa.gov/gistemp/

Includes links to the data-food-products that it produces.

Data Output Here:

http://data.giss.nasa.gov/gistemp/station_data/

Has a link to GHCN version 2 data on that page, and lets you see individual station data after GIStemp is done with it.

CDIAC

The Carbon Dioxide Information Analysis Center. Guess where their bias lays…

Little referenced, they have their own set of data archives. I’ll be wandering through them to see what I can find. Sometimes it has interesting stuff.

Data Here:

USHCN intro page: http://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.html

Home page: http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html

I squirreled away a copy of the daily-by-state data some time ago, but who knows where it is now.

USHCN data:

http://cdiac.ornl.gov/ftp/ushcn_daily/ (by State)

http://cdiac.ornl.gov/ftp/ushcn_v2_monthly/ (in one wad for all).

There are likely other sources for USHCN daily, but I’ve not spent the time to track them down. If you know of any, put a link in comments and I’ll add it.

B.E.S.T. Berkeley

Berkeley Earth Surface Temperature.

Claims to be the best, but isn’t. Not particularly worst either. One of the developers claims to have used the method that skeptics wanted; but it isn’t the method I’ve seen asked for much. It has a slice and dice data splicer at the core of it. Since data splicing is considered a sin in many technical disciplines, how doing more of it, more finely, is a feature; well, that’s beyond me. They also cooked up their own way to store data with their own date format (reasonable since they needed to bring divergent data together) but in a way that is sort of painful to use and a bit of work just to understand. (For example, a date instead of being 30 July 2012 or 300712 or any other is a floating point number. X.YYY where the granularity of the part after the decimal will resolve it to a particular day. I’ll get an accurate description and put it here. Just realize that you can’t take the B.E.S.T. copy of GHCN data and do a straight difference against the other GHCN copy as it has been, um, ‘converted’ in format.

So B.E.S.T. takes in much of the same data as the others, chops, dices, and splices it up a lot. Does more homogenizing and infilling things, then claims it is “data”. Yet another data-food-product, IMHO.

They do have an online archive of their sources (that are largely the same as the above: GHCN / USHCN /… )

http://berkeleyearth.org/about-data-set?/dataset/

The Berkeley Earth Surface Temperature Study has created a preliminary merged data set by combining 1.6 billion temperature reports from 16 preexisting data archives. Whenever possible, we have used raw data rather than previously homogenized or edited data. After eliminating duplicate records, the current archive contains over 39,000 unique stations. This is roughly five times the 7,280 stations found in the Global Historical Climatology Network Monthly data set (GHCN-M) that has served as the focus of many climate studies. The GHCN-M is limited by strict requirements for record length, completeness, and the need for nearly complete reference intervals used to define baselines. We have developed new algorithms that reduce the need to impose these requirements (see methodology), and as such we have intentionally created a more expansive data set.

We performed a series of tests to identify dubious data and merge identical data coming from multiple archives. In general, our process was to flag dubious data rather than simply eliminating it. Flagged values were generally excluded from further analysis, but their content is preserved for future consideration.

So far so good. Start from raw (though some is stated as slightly cooked…) and then combine and clean. It’s then the splice and dice homogenize that gets them, IMHO. “Methodology”.

Data Here: http://berkeleyearth.org/data

Breakpoint Adjusted Monthly Station data

During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred. This breakpoint-adjusted data set provides a collection of adjusted, homogeneous station data that is recommended for users who want to avoid heterogeneities in station temperature data.

What part of “thou shalt not splice data and have no error” is unclear to them? Sigh.

So it’s just a much more homogenized and much more sliced, diced, and spliced data-food-product.

But the good thing is that they put their source data on line, so you can back up ahead of their processing and start over:

http://berkeleyearth.org/source-files

There are many individual links there that I’ve not fully explored. Some of them are already above. Some are not (like the Colonial Era data and the Coop stations). I’d like to archive the lot of them, but time does not allow that at the moment. Perhaps folks could split the job up and each grab a chunk? Assemble again later?

Wood For Trees

Need to put in a good description of these folks. They have nice graphing facilities, and have the data sets behind it. Not yet found out if you can download the whole set of data direct from them.

Top Link: http://woodfortrees.org/

Lists their data sources on the side bar. Does include the UAH satellite data. (At some point I’ll add links to the satellite data, but since they don’t seem to have ‘revisions’ to their history quite so much I’ve not seen it as urgent).

Interactive graph here: http://woodfortrees.org/plot/

Wolfram Alpha

More a calculation site than an archive, yet they clearly have some kind of temperature data archive to be able to compute graphs for folks. Such as this example:

http://www.wolframalpha.com/input/?i=average+temperature+history+new+york

Looking at trends through it a couple of years back, it was clear they did no ‘clean up’ for things like large gaps and odd outlier data; so likely it is (or was) raw not QA checked data. (The data does need some kind of cleaning to be usable, but IMHO Hadley and NCDC go way too far).

Misc and Smaller Sites

There are a lot of ‘bits’ all over. I’ll be expanding this section “for a while”. From individual nations, to specific archives at schools and others. I don’t know much about them. It would be interesting to audit a few of these and compare them with the data-food-products above. If the little guys say “we recorded this data” and the above say “this is it!” and they are different, well… Some examples:

http://academic.udayton.edu/kissock/http/Weather/default.htm

States:

This site contains files of daily average temperatures for 157 U.S. and 167 international cities. The files are updated on a regular basis and contain data from January 1, 1995 to present.

Source data for this site are from the National Climatic Data Center. The data is available for research and non-commercial purposes only.

So it might be possible for some such sites to find older copies held by some other ‘packrat’.

Environment Canada

Has a nice top page that says you can select various kinds of data:

http://climate.weather.gc.ca/index_e.html

If folks post enough links to various national sites, I’ll make a “by nation” section to collect them.

For now, I’m slowly working down the list of sites found by web searches like this one:

https://duckduckgo.com/?q=Temperature+Data+archives

which also found:

http://weather-warehouse.com/

DOWNLOAD a .csv spreadsheet file compatible with programs such as Excel, Access or the Free Open Office Calc, or view in your browser
Exclusive Station Finder Tool helps you find the best data for your needs
Complete National Weather Service archive – over 10,000 stations (far more than most resources on the web)
Unmatched data accuracy
Archived stations some daily data as far back as 1902, most hourly data back to 1972
Meteorologists on hand to assist
Instant access – Get the information you need immediately via your web browser with backup links sent via email

It looks to want to charge you for larger amounts, but might be free for individual station ‘samples’. They claim to have a lot of sources:

Weather Source meteorologists have created perhaps the world’s most comprehensive weather database by unifying multiple governmental and other weather databases together and applying advanced data quality control and correction methods. The resulting “super database” of weather information contains over 4 Billion rows of high quality weather observations. The Weather Warehouse provides users with direct and immediate access to this database. On the Weather Warehouse users have access to the following weather information:

I’ve not dug into it to figure out “what ultimate data source and what processing”, but it looks like all the Excel users our there can get a nice comma separated value CSV spreadsheet for easier processing for some stations.

In Conclusion

OK, that’s it for the moment. I think the GHCN daily is likely the most unmolested of the lot. Coop data and CET are likely pretty useable too. USHCN daily is also likely clean of distortions. Any of the monthly average ‘data’ is not really data. It has been QA checked, filtered, selected, processed; potentially homogenized, adjusted and more. I’d rather start with daily data and work up from there.

If you know of any good archives of old musty data, please add a link!

Subscribe to feed

About these ads

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW and GIStemp Issues, CRUt, NCDC - GHCN Issues and tagged , , , , , , , , . Bookmark the permalink.

13 Responses to Data Sources – A List

  1. E.M.Smith says:

    Probably need to add some SST data sources as well. Like this one:

    http://icoads.noaa.gov/products.html

  2. Steve C says:

    Re CET, Philip Eden at http://www.climate-uk.com/ has been keeping an eye on it. From his background page: “Since Professor Manley’s death, the Meteorological Office seems to have become the self-appointed guardian of the CET series, although one wonders whether it is a guardianship of which Manley would have approved. Their continuation of the series from 1974 onwards uses observations from a variety of stations in the English Midlands (including the southeast Midlands); neither Oxford nor stations on the Lancashire Plain have been utilised, and for 30 years one coastal site was included. It is therefore manifestly not the same series, and large inhomogeneities are apparent.”

    The good news is that he’s working on producing a continuation of the original series, without the assorted modifications. (The bad news, of course, is that sometime in the future, some poor sods are going to have to sort through God only knows how many “corrections”, “homogenisations”, etc., for every climate series out there, if we are ever to see any usable data again.)

  3. tom0mason says:

    Again it’s a big thank-you for all the links an opinions on their worth, sure will save time when I finally get-round-to-it.

  4. tom0mason says:

    I have been reading tchannon’s blog about the construction of the Met Office CET sliced up records that run 1660 throught to about 2012. Called ‘Part 1, Central England Temperature timeseries and Manley papers’, he promises me that some interesting revelations should be in Part 2 (when he gets-round-to-it). It’s at –

    http://daedalearth.wordpress.com/2014/05/11/part-1-central-england-temperature-timeseries-and-manley-papers/

    PS the link to the historical records of ‘Royal Meteorological Society’ has some fascinating information.

  5. Larry Ledwick says:

    Perhaps your data collection of old data versions should be made available to the internet archive for them to store and preserve?

    https://en.wikipedia.org/wiki/Internet_Archive

    I believe you can also request the wayback machine to crawl your forum and if you create a local archive on the forum it would be picked up as they crawl the site. You would not have to leave the data there forever, only long enough for the web crawler to archive it.
    Once it had been captured in a crawl, then you could remove your local file and just add an entry to a reference index for when the data set was present on your archive so it could be found using the wayback search. The you could pull down a copy of any data at any time and any place you or some visitor happened to be.

  6. E.M.Smith says:

    @Larry Ledwick:

    An interesting idea. I’m more inclined to just find a place to post up a tar file or csv.gz file. It’s more a matter of time on my part. Time to find a place, set up an account, package up the data, etc.

    In general, yes, my intent is to batch them up at some point. Maybe the first step ought to just be gathering them together in one place, documenting what / when each set covers (since it isn’t always an absolutely full set…) and then put them on a DVD. Anyone who wants it sends in a few bucks and I mail it. My original intent had been to put it on a bittorrent feed, but so far all the services I use here block bittorrent. Too bad, since it is a great way to bypass the server model of the world using peer-to-peer.

    In the longer run, I’ve pondered doing a ‘splice’ of the sets. So, for example, v1 until it runs out, then v2 until it runs out, then v3. That way the oldest form is preserved (and the cooling of the past thwarted).

    The most important part is just snagging a copy, and that’s more or less done.

    @Tomomason:

    You are welcome.

    @Tomomason & Steve C:

    CET is a very widely watched data set with lots of old copies laying around. It ought to be a bit difficult to fool around with it very much. Won’t stop anyone from trying, though…

    Also ought to be interesting to watch what folks find ;-)

  7. omanuel says:

    E.M. Smith,

    As I recall, Climategate emails came from a Russian server. A bit of trivia that may help explain the puzzle.

    The USSR was the only surviving tyrannical power after WWII:

    1. Hitler’s Nazi Power was gone
    2. Japan’s Imperial Ruler subdued
    3. Stalin’s Communist Government

    Tyrannical, lock-step opinions had paraded as consensus science in the old USSR under Stalin for decades.

    A member of the community of Russian “skeptics” probably gave us “unskeptical” Americans our first hint of the real winner of WWII.

  8. hillbilly33 says:

    Hi Chiefio. Long time no post, but this old Scarlet Runner bean-growing Tassie boy (Tasmania, Australia) is still alive and kicking at 81! Glad to see you’re still keeping tabs on the data-cookers.

    O/T but thought you’d be interested in this story of the state of New South Wales, under a Liberal (conservative) government, wanting to be Australia’s renewable energy answer to your own beloved state of California. Can you believe it? Others here can’t, as is shown by the article linked below.

    We thought that with the change of national government we’d be done with most of this CO2 driven CAGW nonsense, but although some progress has been made in getting rid of the worst of the unaccountable waste of money and resources, it’s being fought all the way by Labor and the Greens, It hasn’t been helped by this NSW decision..

    http://www.theaustralian.com.au/opinion/california-dreaming-is-nuts-in-nsw/story-e6frg6zo-1227006323188

    Trust you and the family are well.

    Cheers mate.

    BTW. It’s freezing in Hobart today with a big wind chill factor coming into play from Mt.Wellington!

  9. hillbilly33 says:

    Me again Chiefio. Just clicked on one of the late great Tassie John Daly’s favourite Stations, the Valentia Observatory, Ireland.

    Ho Hum! Nothing to worry about there and certainly no cause at all for the CAGW Cult to celebrate..

    http://data.giss.nasa.gov/cgi-bin/gistemp/show_station.cgi?id=621039530000&dt=1&ds=14 .

  10. hillbilly33 says:

    Interesting to see what the Launceston Airport graph showed on John Daly’s site up to 2002 compared to what it currently looks like. It consistently refused to show warming and was one of only four Tasmanian stations that survived the ‘Great Dying of Thermometers’ in 1992., when we lost at least 20 including all our colder ones.

    Despite some obvious adjustments since John’s time It now appears to have been dropped in 2010 for it’s continuing sins against the CAGW hypothesis and we now have no stations anywhere near the mountains or the colder West Coast. Hobart Airport, Cape Bruny not far south of there in the middle of the Derwent estuary and Eddystone Point on our much warmer East Coast are the only three survivors!

    http://data.giss.nasa.gov/cgi-bin/gistemp/show_station.cgi?id=501949680000&dt=1&ds=14

  11. omanuel says:

    The conclusion to Climategate:

    The combined forces of nations cannot hide the Creator, Destroyer and Sustainer of every atom, life and world in the Solar System (p. 13):

    https://dl.dropboxusercontent.com/u/10640850/Preprint_Solar_Energy.pdf

  12. omanuel says:

    Comments, corrections or criticisms would be appreciated on the above paper under review.

  13. E.M.Smith says:

    @OManuel:

    The reason to use a Russian server is just that “it is there”. The Russian hackers tend to make such resources available. (My personal opinion is that this might be ‘by design’ as it gives special information to the government of Russia – or perhaps Russia is just a bit more lawless… But in any case “it just is”. Nothing more. No deep meaning. The most stable and open servers, furthest from western legal inspection and warrants, tend to be the Russian ones.)

    @Hillbilly33:

    Sigh. When your goal is to be more “Land of fruits and nuts” than California, something is way off base… ;-)

    It will take a long time for the broken ideology to die. Ideologies are like that. Long after the head is cut off, and longer after all reason is gone, there is a core of idiots and those who make a living from it that keep pushing it. Whatever “it” is.

    FWIW, it’s warm and pleasant in Florida… Come on up!

    I’d look at the unadjusted temps in GHCN ( if they can still be found un-adjusted) and not the molested data-food-product that is GIStemp, IMHO…

Well? Say something!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s