Mr McGuire Would NOT Approve

Lt. Colonel Insignia

full size image

UPDATE: Since comments have become a repetative “Does So, Does Not – this has been covered already above, read it.” Then folks don’t read it. I’ve turned comments off. Perhaps for a while, perhaps for a long while, perhaps forever. Frankly, I have much more valuable and interesting things to do than tell people they are repeating already repeated too many times points. I don’t see where it removes anything from the issue, both sides are already well represented below.

On WhatsUpWithThat, one of the commenters asked, roughly:

Regarding the GISS output over time, is there anyone here who has any confidence in the second decimal place in either reported temperatures or reported temperature anomalies?

There can be no confidence in anything smaller than whole degrees F.

For some reason, it seems that many folks stop here and jump to the “law of large numbers” shiny thing. I’m going to add a note here so that those who feel frustrated by not getting to do that will, with luck, understand why. This is well covered in comments, but folks don’t seem to be bothering to find it there.

One common error is to presume that GIStemp takes thousands of temperature data points and averages them together. Then folks launch into a grand description of how this average can very well have greater precision than whole degrees F. One Small Problem: That is not what is done and that is not my complaint. “The central limit theorem” is just not applicable. GIStemp takes in monthly averages of daily averages of the daily MIN and MAX. When you are averaging temperatures, you are averaging exactly 2 of them. The daily MIN and MAX for one place for one day. Everything after THAT is just averaging averages (and sometimes averaging averages of averages of averages…). So the central limit theorm says you can get much more precision about that AVERAGE but not about the temperatures that made it up.

The next common error is to jump to “but it’s anomalies so it doesn’t matter”. It isn’t. The anomaly map is produced at the last stages of GIStemp. These averages of averages of averages are smeared around in all sorts of ways before the anomaly map step is done. They are used to “in fill” missing data. They are used to adjust during “homgenizing” data. And they are used in the Urban Heat Island Effect adjustments. They clearly can spread their influence up to 1200 km in some steps, and 1000 km in others. It is even possible for a 1000 km “in-fill” to be used for a further 1000 km “UHI” and then to be used for a further 1200 km step. While I think this is unlikely, it is possible.

So long before any “large numbers” or “anomalies”, the averages of daily MIN/MAX averages have been spread all over.

The Raw Data

The historic data were measured to 1/10 th degree F then they were rounded to whole degrees F for reporting. Each day had three samples (min / max / TimeOfObservation I believe). If a datum was missing it was fine to “guess” and fill in what you thought it ought to have been on the form.

“Temperature is measured electronically. High or low temperatures for the day may be estimated when necessary. Temperatures are measured in degrees and tenths fahrenheit, and reported as whole degrees, rounding down from .4 and up from .5 “

Averaging is not Over Sampling

Now you can “oversample” a single thing and get a synthetic accuracy that exceeds your actual accuracy; that requires measuring the same thing repeatedly. We don’t measure each day/location repeatedly. We measure it once. Then NOAA averages those data points (min / max) to get a daily average. Exactly two items are averaged.

Then they take those daily averages for the month and average them to get a monthly average mean of the daily means. At most 31 items are averaged.

This is what is used by GIStemp. At this point we are already 2 “averaging steps” away from “temperatures”. While we can gather interesting meta statistics about those averages of averages, such as a highly precise “mean of the mean of means”, that does not tell us things about the mean of the temperatures. The order of averaging things does matter.

In my opinion, NOAA does a better job than GIStemp, but they do manage to start the ball rolling on the false precision front by creating this data with 1/100 F precision when the raw data only have 1 F accuracy. (And remembering that they average at most 2 temperature items then average at most 31 of those averages. There is no ‘large number of items’ from which to have a “central limit theorem” improvement in precision.)

I have also found a great simple example and explanation of how the math ought to be done at: http://mathforum.org/library/drmath/view/58335.html

GIStemp – Where data manipulation begins

UPDATE: As of the end of 2009, GIStemp is using USHCN.v2 for US data and that is now in 1/10 F (one presumes NOAA / NCDC realized some of the folly in the 1/100 F place.) I do not yet know if they use the 1/10 F or 1/100 F as they calculate the GHCN data. They DID use a new “adjustment” on the USHCN.v2 vs the USHCN original method that causes the 1/10 F place to jump all over in comparison, so we now have about 1/2 F of “jitter” from “adjustments” that completely swamps the “precision” of the averaging process.

GIStemp then takes those 1/100 precision whole F accuracy temperatures (or 1/10 F in the new version with 1/2 F variations from the prior versions) and starts mathematically manipulating them (adding, dividing) to make many averages of averages. Typically in fairly small groups. These are what becomes the basic input to further GISS processes. NOAA provides a table of monthly averages of daily means in F with false precision into the 1/100 F. GISS then converts these to C. The relevant bit of code is in USHCN2v2.f

```   10 read(2,'(a)',end=100) line
[...]
if(temp.gt.-99.00) itemp(m)=nint( 50.*(temp-32.)/9 ) ! F->.1C
end do

write(nout,'(a11,i1,i4,12i5)') idg,iz,iyear,itemp```

There is a bit of sloppiness here in that “9″ is an integer and “32.” is a floating point number as is “50.” then they do a cast with nint into an integer type. I’m not sure what this mixing of data types will do to the precision in the low end bits (probably nothing) but a FORTRAN expert ought to pass judgement.

(In a later posting I found a compiler dependent bug in the way GIStemp does this step that “warms” 1/10 of the data by 1/10 C. Yes, 1/10 C is dependent on what computer compiler you run. It’s an easy fix, documented in the “F to C Conversion Issues” posting.)

A cleaner approach would have been to leave everything in F and avoid false precision, but they probably decided C was more trendy… They do have to at some point face the fact that the USHCN set is in F and the rest of the world is in C; so doing it this way might make sense IFF you watch the false precision properly (which they don’t do).

Some folks want to think this can be treated like an over sample of the month, but it can’t. There is no monthly average of daily min/max average temperatures to repeatedly sample. There are only individual days with their own precision and accuracy being sampled. The monthly average of MIN/MAX averages is only a computed artifact, not a real thing to over sample. (IF you were to take all the daily temperatures and average them in one go, then you could begin to apply the central limit theorem; but then you would have to confront the Nyquist limit…)

This is where the “fun” begins.

The C number already has some false precision in it; but you can almost forgive it since they have the choice of giving a bit of false precision (but preserving all the information that was in the original full degree F number) or giving a full degree C precision (and having no false precision, but losing the difference in range that a degree F vs C has). IMHO it would be easier and better to put everything in F and have avoided the issue, then convert to C “at the end”. But folks for some reason hold the politics of being International and using SI units as more important that having an easier time tracking the actual accuracy and precision. (It is easier to go to the unit of measure with the “finer grain”; but given the 1/2 F “jitter” from choices about “adjustment method” it is at most a nit, and not a very relevant one.)

And these folks at Colorado.edu in discussing Geographical Information Systems (that they abbreviate GIS) do suggest that the custom in some cases is to carry forward 1 digit of false precision so that the user can decide what to do with it.

So I can see that as a reasonable choice. That, in the end, GIStemp comes up with 1/100 C anomaly maps leads me to believe they do not do the right thing with it. And for those folks just chomping at the bit to say “but with thousands of temperatures we can get that precision in the anomaly” let me just point out that many anomaly boxes have all of ONE location reporting. Many more are in the 2 to a dozen range. You just don’t get to apply the “central limit theorem” to things counted in ones and twos…

Too often GIS projects fall prey to the problem of False Precision , that is reporting results to a level of accuracy and precision unsupported by the intrinsic quality of the underlying data. Just because a system can store numeric solutions down to four, six, or eight decimal places, does not mean that all of these are significant. Common practice allows users to round down one decimal place below the level of measurement. Below one decimal place the remaining digits are meaningless.

So they recognize that the false precision digits are meaningless, but have common practice bringing forward one of them.

Now GISS takes the input numbers and starts doing strange and wondrous things with them. see:

gistemp-step0-the-process

if you want to start touring the actual computer code and process.

Mr. McGuire Would Not Approve!

My high school Chemistry & Physics Teacher was one Mr. McGuire. He was a retired Lt. Colonel from the U.S. Air Force (WWII with combat) and a retired chemist from U.S. Steel.

He ran a very precise, very accurate, and very disciplined class. One did not cut corners nor stretch the truth one single decimal point in his class. Any problem in any homework, lab, test, you name it (even if otherwise perfectly done) that had a digit of precision beyond its accuracy would lose points. Sometimes all of them. By his rules, GIStemp would get an F (and that’s not for Fahrenheit).

The GIStemp process includes a great deal of math and averaging.

Now what I learned in school (“Never let your precision exceed your accuracy!” – Mr. McGuire) was that any time you did a calculation, the result of that calculation could only have the accuracy (and thus ought to only be presented with the precision of) your least accurate term. Average 10 12.111111111 and 8.0004 and you can only say “10″, not 10.0 and certainly not 10.1111 or 10.04 as that is false precision.

(In fact, it’s slightly worse than this, due to accumulation of errors in long strings of calculations and the repeated conversion that GISS does from decimal in intermediate files to binary at execution and back to decimal in the next file… but that’s a rather abstruse topic and most people glaze so I’ll skip it here. But just keep in mind that repeated type changing corrupts the purity of the low order bits.)

So what gets trumpeted and ballyhooed?

Things (Not temperatures! Calculated anomalies based on averages of interpolations of averages of averages of averates of two temperatures – no, that is not an exaggeration! In fact I’m leaving out a few averaging and interpolating and extrapolating steps! ) measured as X.yz C! Not only is the “z” a complete fabrication, but any residual value in “y” from the greater precision of F over C in the raw data has long ago been lost in the extended calculations and type conversions. (And, as seen in the changes from USHCN to USHCN Version 2; swamped by various adjustments and manipulations.)

IF you are lucky the X has some accuracy to it.

(Though GISS even manages to corrupt this via “the reference station method” that lets them rewrite the whole thing based on other temperatures or anomalies up to 1200 km away… In the “Slice of Pisa” thread we see that the past of Pisa is made 1.4C COLDER in a broken UHI adjustment that goes the wrong way.)

Under the Italy thread on WUWT there was a blink chart of Pisa that showed about 1.4C “adjustment” to one of the data points. So if we are creating 1.4 of “fantasy” how much truth is left in the 0.01 C place?

GISS data are thoroughly cooked and, IMHO, only useful for fairy tails and monster stories…

To the inevitable assertion that it’s only the US data so the global number is still fine, see:

So Many Termometers So little Time

The fact is that the only long term records we have are dominated severely by the U.S.A. and Europe. GISS makes up much of the rest by various sorts of in-filling of average boxes and in-filling over time.

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW GIStemp Specific, Favorites, Metrology and tagged , , . Bookmark the permalink.

50 Responses to Mr McGuire Would NOT Approve

1. climatebeagle says:

So GISTemp starts with the historic montly averages. Do you know if the raw daily max/min are still available from the ground stations?

2. Jeff Alberts says:

In my opinion, code such as this should be required to be published in journals, the same way research papers are, if they are to be taken seriously. Something this important needs to be peer-reviewed AND replicated for validity.

Somehow GISS is not required to answer to their peers.

3. E.M.Smith says:

It looks like daily data are available from here:

http://lwf.ncdc.noaa.gov/oa/climate/climatedata.html#daily

though I haven’t done a download to find out what you actually get (i.e. usability).

And yes, GIStemp ought to be put through the peer review wringer but it is being left to a volunteer who “will program for beer” 8-) Kind of makes you wonder where formal science has gone off to… Then again, I like to think of myself as just doing what science was back in the 1600 – 1800 range where all it took was a bit of brains and self appointment. So I’ve appointed me. Wonder if that makes me a “Peer”? I’d like to think I’m better than that 8-}

4. climatebeagle says:

Thanks for the pointer to the daily data. One can obtain the data in a variety of formats, the two I’ve looked at are an ftp site with a text column based file for each year at a station. You can also get pdf files that are scans of the original forms filled out by the observer, though this is through a web-interface and not ftp.

5. Jeff Alberts says:

A peer, to me, would be anyone with sufficient knowledge to intelligently critique the subject matter.

6. Ross says:

Again, you are to be commended for your efforts to decipher what is going on.

I will not pretend to understand anything but the gist of what you are saying and doing, and I do not expect further explanation for a lay person.

7. fred says:

So any significant reduction in the GISTEMP trend over the past 30 years would result in it showing less warming than the satellite records and the other surface records.

For example if we decide GISTEMP has 0.1C/decade too much warming and we correct for it we get:

That correction has made the situation worse, not better. Now the records are all inconsistant. The similarity with other records is a good reason why GISTEMP probably isn’t significantly incorrect.

Perhaps a few hundredths of a degrees can be squeezed out of it, but nothing particularly noteworthy.

8. E.M.Smith says:

Fred,

You seem to have entirely ignored my statement that the tenths place is entirely fictional. You can make no statement about relative ‘correctness’ in a fictional number.

Please take at look at the issue of False Precision. The original data are recorded in full degree F, there can be no meaning in any number more precise than that. There is no ‘made better’ or ‘made worse’ from any changes of any sort to the right of the decimal point; there is only a different error inside the error bands of the calculations. See:

http://chiefio.wordpress.com/2009/03/05/mr-mcguire-would-not-approve/

It is simply irrelevant to talk about any significance of any kind from any temperature average to the right of the decimal point that depends on those original full degree F data (at least for the whole U.S. land record. I’m not sure how the rest of the world was recorded, but assume it is similar. Though even if it were recorded in 1/10 ths, once averaged with a less accurate full degree F number, the result has the lesser accuracy so you still can not use the 1/10 ths place).

9. fred says:

Fractions of a degrees come into play because of something in math called averaging.

For example the the average of the two whole numbers 19 and 17 is 17.5

The average of a station reading 18C and one reading 17C is 17.5C

10. E.M.Smith says:

Fred, you missed the point. Yes, I can average two numbers and come up with a fractional part, that does not mean that the fractional part has “worth” or “means anything” due to a property called False Precision. Taking your example:

I measure 19 F with my thermometer. Now because my thermometer is only reporting whole degrees, that could in reality be 18.5 (which rounds up to 19) or 19.4 (which rounds down to 19). Same thing with your “17″ that could really be anywhere from 16.5 to 17.4 F. Now averaging those ranges together gives anywhere from 18.5 + 16.5 = 35/2 = 17.5 on the low end up to 19.4 + 17.4 = 36.8 / 2 = 18.4 on the high end.

ANY value in that range might be the actual value, but we can’t know what it really is. That is, the “precision” of 17.5 is false, because it might be 17.9 or even 18.3 and due to this fact, the only real precision that can be stated is whole number precision: 18 F (with no fractional part).

BTW, in your example 19+17 = 36 / 2 = 18 so you did a bit of math wrong… if you had started with 18 and 17 the average would have been 17.5 (with False Precision). The actual values could be anywhere from 17.5 to 18.4 and for the second term 16.5 to 17.4 for the second term. Thus the ranges could be 17.5 + 16.5 = 34 / 2 = 17 and on the high end 18.4 + 17.4 = 35.8 /2 = 17.9 which again gives the range of what the value might actually be in reality, but we can’t know what it is due to the original number being in full degrees. So we can only say 17 degrees with no statement as to what the fractional part might be.

PLEASE take the time to learn what False Precision is and why it is absolutely wrong to find some merit in the values to the right of the decimal point from calculations starting with whole digit numbers.

Until you do that you will make no progress on this issue and you will just continue to post the nonsense that the fractional degrees have some meaning. They simply don’t.

One Last Time: Go read “Mr. McGuire Would Not Approve” (and now I’m going to add the requirement that you understand it too). If you are unwilling to do that, your posting here will simply make you look more and more foolish. I have no desire to have anyone “look bad”, but there is only so much I can do to protect you from yourself…

I’ll give you one more example, then I’m done. This one will use a reductum ad absurdum – taking to an extreme so obvious it’s hard to miss.

Lets say I have a thermometer the measures in 10 F increments only, but when I read it, I write down what I guess is the rest of the value. One number I write down is 10.000001 and another is 10.00000000001 what is my average? The correct answer is “10″ because that is all I really know. All those zeros and the “1″ in the “way out there” place have no meaning. My thermometer could only tell me “0″ or “10″ or “20″ and there are not really any other choices. Even if I wrote down 10 and 11, the average is still “10″ because my thermometer could not tell the difference between 9, 10, or 11 so that 11 has False Precision.

Hopefully that helps you understand. If not, then please go talk to a math teacher or engineer and get some tutoring on precision and False Precision.

11. peter_dtm says:

Excellent !

I had not thought this through; but I am reminded of being TOLD to produce calibration curves for 10 bit based instruments; where the accountants demanded iI submit data in the range 0.000 mA to 20.000mA. Needless to say I refused (accuracy is only 1/1024 +/- 1/1024 for a range of 0 to 20 mA nominal). Much argument ensued; if accountants who are supposed to be numerate don’t get this; what chance a Gore ?

12. E.M.Smith says:

@Peter_dtm:

I think it helps to have had some experience with binary conversions and precision issues. I first got this insight from Mr. McGuire, but it was fortified a lot in my first programming class… in FORTRAN IV … We spend a lot of time on precision issues and even had several problem sets with questions about how to know what precision was appropriate, what the precision of input data would be, intermediate forms, and output formats (and when any part of the chain had False Precision …)

It was always fairly clear to me, but frankly, the folks that I’ve run into that have advanced technical degrees and still “don’t get it” just astounds me…

13. fred says:

By this argument the statistic of “2.5 children per family” would be judged meaningless “because you can’t have half a child” or “measurements of each family is based on a whole number of children so fractions are meaningless”

But obviously a fraction can be meaningful as an average. I don’t know how more simply to put it really.

The global temperature record is a fine example of this, because look at it – year to year and even month to month it doesn’t change by anything but a few tenths of a degree. This indicates that tenth of a degress precision is meaningful – even required – to follow such a metric. And this goes for every record, GISTEMP, UAH.

14. E.M.Smith says:

Fred,
I will state the answer to your puzzlement here, but I fully expect that you will not understand it and will continue to indulge in public displays of your failure to understand. I’ve tried to talk you into “not going there” but you persist. OK. Like I said before, I can’t protect you from yourself.

Stating something simply does not improve it’s truth, it may well simply state the error… That is what you have done. Precision, and especially False Precision, is not simple. It involves some rather specific and not always obvious properties of numbers.

If you phrased your postings as a quest for understanding, it would be better. Phrasing it as you do, as absolute statements of belief in an error, does not encourage me to spend time trying to teach you.

At any rate, here is where and why you are badly wrong:

A child is an INTEGER function AND a discrete event, not a floating point – a range. The precision of an INTEGER for discrete events is functionally infinite. The conversion into a floating point number can be given as many decimal digits of precision as you like, because the integer is infinitely precise.

A single child is a discrete event and an integer 1 or a floating point number 1.0000000000000000 for as many zeros as you like. So this can then be used to calculate an average child / family with similarly infinite digits of precision.

This is a far different situation than a quantity that does not start as an integer, but is truncated to an integer value. (i.e. it is really a floating point number of zero digits of precision to the right of the decimal point.) So, for example, I can say I am 6. feet tall. That does not mean my height was measured with a laser beam as 6.00000000000000 ft. It means that my hight is somewhere between 5.5 and 6.4 AND YOU DON’T KNOW WHICH IS IS.

Your error band is slightly less than one whole digit.

In the case of 6 children, you know it is exactly 6.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
repetent 0

children. Get it? Your error band is 0. The fractional part is defined as exactly zero to infinite precision by the definition of an integer discrete event. You KNOW that I don’t have 5.9 kids or 6.1 kids. I have exactly 6 kids.

So you can average a bunch of integer data points and come up with a value that has fraction precision provided the original data were in fact discrete integer events but you can not average a bunch of data items that were measured as ranges (i.e. floating point items – non discrete integer items) and then truncate the precision to an integer and claim they are now exactly precise as an integer.

In particular, the “average height” of every adult in my family by your method (measure, round to whole feet and truncate prcision, average) would be 6.000000 feet. Despite the fact that not one of us is 6 feet tall or taller (we are all less than 6 and more than 5.5). To say our average height is 6. with no following 0 is accurate since that tells a mathematician or computer programmer that you mean “somewhere between 5.5 and 6.4 but you don’t know where” and that accurately describes my family. If done in floating point numbers, you would find that our average hight is roughly 5.8 feet (significantly less than 6.000000). Now to get that 5.8, you MUST HAVE THE DATA in at least FOOT.x but if you measured as FOOT.x and turned it into 6 Foot, 6 Foot, 6 Foot, 6 Foot: The method you advocate would find an average height of 6.00000 feet AND IT WOULD BE A LIE and demonstrably so.

You ether MUST keep the precision to the right of the decimal point, or you can NOT state anything other than 6 feet PLUS OR MINUS ABOUT A HALF. Where GIStemp (and I suspect the others too) go wrong is they effectively state “6.00000 FEET!” and rising at 0.0001 / time!!!!

Now, to your assertion that ~”GIStemp et. al. having stability in the tenths place shows they tenths places have accuracy” is simply demonstrated as false by my above example. Every Single Time you do the calculation you will find my family to have an average height of 6.00000 feet with great stability. And every time you will be wrong. Very reliably wrong. Very repeatably wrong. But wrong.

So fred, every time you come back and feel compelled to post another error, please re-read the above. And re-read “Mr. McGuire would not approve”. Keep doing that until you “get it”. It is a complete waste of my time and your time to do anything else.

Repeatedly wasting my time without making any progress will not engender willingness on my part to tolerate this much longer. Also, please put your comments about precision under the “Mr. McGuire Would Not Approve” posting, not this top level aggregator. It is much more germane to that topic and it will demonstrate to me that you have actually read that posting (which treated this whole topic already.)

15. E.M.Smith says:

BTW, fred:

This site:

http://mathforum.org/library/drmath/view/58335.html

does a wonderful job of explaining why false precision limits as it does and has some great examples along with a simple way to SEE the False Precision impact in a calculation (using X in the non-value places).

It’s a quick easy read and makes it clear pretty darned fast.

16. Fluffy Clouds (Tim L) says:

Hey I never thought of coming here lol
Precision numbers IS what I have been going after as well.
http://noconsensus.wordpress.com/2009/05/16/quantum-entanglement/
reprints from my post

Fluffy Clouds (Tim L) said
May 17, 2009 at 2:29 am

“I am confused as to why the non-conservative USA Today publishes a temperature chart, extending back 1,000s of years, accurate to a 1000th of a degree. That is truly irrational. Who believes that arbitrary precision?”

arbitrary precision of accuracy of a 1000th of a degree.
Jeff, we must be care full on this.
campus will be opened in june I will try to pin down these.
significant digits http://en.wikipedia.org/wiki/Significant_figures
and http://en.wikipedia.org/wiki/Accuracy_and_precision

Fluffy Clouds (Tim L) said
May 18, 2009 at 12:48 am

OK, I would agree IF…. you read the SAME thermometer 1000 times an hour ALL DAY!
That is not what is going on here. Get it?
analogy… use a tape measure to get the size of your crank shaft barrings, you can measure it 1000 times BUT you still will not get a 1000th of an inch reading ok?
Jeff you are very very smart man, how can this escape you?
it is a different thing sampling one item a 1000 times than to read a 1000 items once.
Well I hope this gets into that head of yours, and does not piss you off at me !!!

how is it that Precision numbers is so over looked in the scientific community?
according to Precision numbers the best they can do is this:
0.0
0.05
0.1
0.15
0.2
0.25

there can not be anything like
0.123
can not be scientific.
stiag et al 2009 .118 IS A scientific joke!!!
and any science person would laugh at HIM for not using scientific principals!

17. davidc says:

EMS,

Of course you’re right (thought I’d get that in early) but you are getting frustrated by not seeing where the others are coming from (which is: statistics). Everyone knows that the standard error of the mean is the standard deviation of the observation divided by sqrt(n), n being the number of observations. So as n becomes large the “error” in the mean gets to be much less than the “error” in the individual observations. Underlying this thinking is a statistical model of the form x(i)=x+e(i) where e(i) is a random variable. The model you have in mind is x(i)=y(i)+/-(s/2) with x(i) the correct value, y(i) the measured value and s (or s/2) the precision. To estimate the mean x of the true value, by adding the x(i)s and dividing by n gives x=y+/-(s/2) where y is the mean of the measured values. That is, as you say, the systematic error is not reduced by averaging.

To resolve this, I suggest that the correct model is in fact
x(i)=y(i)+/-(s/2)+e(i)
with both systematic and random errors. The error in the mean now has two components: the systematic component +/-(s/2), the same as before. And a random component with “error” SD/sqrt(n), the same as before. As n increases, the random component decreases towards zero but the systematic component does not.

So the systematic error in the mean in the case of a temperature measued as +/-0.5F is as you say +/-0.5F. With enough
observations we can forget about the random component without actually knowing anything about it.

The next question is what this means for a trend. Clearly not significant for anything less than 0.5F.

This might be worth clearing up for a contribution to WUWT.

18. davidc says:

EMS,

OK, let me try a bit harder. Let x(i) be the true value (unknown) associated with the (i)th observation y(i). Assume

x(i)=y(i)+s(i)+e(i)

where s(i) is the systematic error (known) and e
(i) is the random error (unknown). We are interested in the case where we have n observations of y(i) and want to know something about the true values x(i). Let’s formulate this as BU(n,c,x) as an upper bound we can put on the true values of the x(i) after n observations at some confidence level c (eg x(i)<BU for c=95% of observations; but we can do better with c, see below) and BL(n,c,x) as the lower bound. Now suppose that the error terms are small enough (if this isn't true we can stop here) that

BU(n,c,u+v)=BU(n,c,u)+BU(n,c,v) for u,v=y,s,e

(say, first order Taylor series approx) and the same for BL. So

BU(n,c,x)=BU(n,c,y)+BU(n,c,s)+BU(n,c,e)

and the same for BL. The y's are known so BU(n,c,y)=ymax(n), the largest y after n observations and BL(n,c,y)=ymin(n). Now

BU(1,c,s)=s(1) and BL(1,c,s)=-s(1)

These are independent of c because we can be sure that the systematic error lies between these limits however we specify c. Now, a bit more tricky,

BU(n+1,c,s)=max[BU(n,c,s),s(n+1)]

This follows if we want BU(n+1,c,s) to be independent of a confidence level c. All it is saying is the upper bound for the s(i) after n+1 observations is either the biggest up to n observations, or the (n+1)th, whichever is larger. So this leads to

BU(n,c,s)=MAX(n,s)

where MAX(n,s) stands for the biggest value among all the n values s(i). In practice we might expect all the s(i) to be the same (eg 0.5F) so MAX(n,s) could also be understood as the maximum systematic error involved in individual observations (and then independent of n). On the other hand we might consider MAX(n,s) to include nonrandom adjustment error, presumably dependent on n. Similarly BL(n,c,s)=MIN(n,s).

The random error term BU(n,c,e) is obviously much more complicated. But here we can appeal to statistics and assume we can write this term, for large enough n, in the form

BU(n,c,e)=PU(c,e)/f(n)

where P(c,e) is some function independent of n, which depends somehow on how we specify c and the probability distribution of e and f(n) is an increasing function of n. In the case that e is normally distributed, f(n)=sqrt(n). Similarly for BL(n,c,e).

Putting these individual terms together we get

BU(n,c,x)=ymax(n)+MAX(n,s)+PU(c,e)/f(n)
BL(n,c,x)=ymin(n)+MIN(n,s)+PL(c,e)/f(n)

(noting that MIN(n,s) and PL(c,e) are expected to be negative). For large enough n, with f(n) an increasing function, the random error terms vanish regardless of the form of PU and PL (which allows us to avoid the question of how a value of c can be chosen to satisfy requirements of both the systematic and random errors). MAX(n,s) is most likely independent of n (eg0.5F), but if not will be a slightly increasing function. Similarly, abs(MIN(n,s)) is expected to be increasing or constant. We really can't say anything about ymax(n) and ymin(n) except perhaps the commonsense view that with n becoming large we will approach some approximation to the true values which will not depend on n. So leaving out n and with an obvious change in notation

xmax=ymax+MAX(s)
xmin=ymin-abs(MIN(s))

which you knew anyway, except you would probably have ymean in place of ymax and ymin .

Extending this treatment to a trend is obvious. Of course, conventional least squares is pointless. With systematic errors dominating, the error distributions (if you want to think this way) are rectangular and there is no unique maximum likelihood line. That is, the likelihood function is the same for the infinitely many lines than pass within the error bars of all data points and zero for any line that doesn't. The obvious choice (which I seem to remember doing before I got sidetracked by statistics) is to draw lines with maximum and minimum slopes that fit within the error bars. Most climate trends would disappear with that type of analysis (ie equal likelihood for lines with positive and negative slopes).

19. davidc says:

EMS,

OK, let me try a bit harder. Let x(i) be the true value (unknown) associated with the (i)th observation y(i). Assume x(i)=y(i)+s(i)+e(i) where s(i) is the systematic error (known) and e(i) is the random error (unknown). We are interested in the case where we have n observations of y(i) and want to know something about the true values x(i). Let’s formulate this as BU(n,c,x) as an upper bound we can put on the true values of the x(i) after n observations at some confidence level c (eg x(i)<BU for c=95% of observations; but we can do better with c, see below) and BL(n,c,x) as the lower bound. Now suppose that the error terms are small enough (if this isn't true we can stop here) that BU(n,c,u+v)=BU(n,c,u)+BU(n,c,v) for u,v=y,s,e (say, first order Taylor series approx) and the same for BL. So

BU(n,c,x)=BU(n,c,y)+BU(n,c,s)+BU(n,c,e)

and the same for BL. The y's are known so BU(n,c,y)=ymax(n), the largest y after n observations and BL(n,c,y)=ymin(n). Now

BU(1,c,s)=s(1) and BL(1,c,s)=-s(1)

These are independent of c because we can be sure that the systematic error lies between these limits however we specify c. Now, a bit more tricky,

BU(n+1,c,s)=max[BU(n,c,s),s(n+1)]

This follows if we want BU(n+1,c,s) to be independent of a confidence level c. All it is saying is the upper bound for the s(i) after n+1 observations is either the biggest up to n observations, or the (n+1)th, whichever is larger. So this leads to

BU(n,c,s)=MAX(n,s)

where MAX(n,s) stands for the biggest value among all the n values s(i). In practice we might expect all the s(i) to be the same (eg 0.5F) so MAX(n,s) could also be understood as the maximum systematic error involved in individual observations (and then independent of n). On the other hand we might consider MAX(n,s) to include nonrandom adjustment error, presumably dependent on n. Similarly BL(n,c,s)=MIN(n,s).

The random error term BU(n,c,e) is obviously much more complicated. But here we can appeal to statistics and assume we can write this term, for large enough n, in the form

BU(n,c,e)=PU(c,e)/f(n)

where P(c,e) is some function independent of n, which depends somehow on how we specify c and the probability distribution of e and f(n) is an increasing function of n. In the case that e is normally distributed, f(n)=sqrt(n). Similarly for BL(n,c,e).

Putting these individual terms together we get

BU(n,c,x)=ymax(n)+MAX(n,s)+PU(c,e)/f(n)
BL(n,c,x)=ymin(n)+MIN(n,s)+PL(c,e)/f(n)

(noting that MIN(n,s) and PL(c,e) are expected to be negative). For large enough n, with f(n) an increasing function, the random error terms vanish regardless of the form of PU and PL (which allows us to avoid the question of how a value of c can be chosen to satisfy requirements of both the systematic and random errors). MAX(n,s) is most likely independent of n (eg0.5F), but if not will be a slightly increasing function. Similarly, abs(MIN(n,s)) is expected to be decreasing or constant. We really can't say anything about ymax(n) and ymin(n) except perhaps the commonsense view that with n becoming large we will approach some approximation to the true values which will not depend on n. So leaving out n and with an obvious change in notation

xmax=ymax+MAX(s)
xmin=ymin-abs(MIN(s))

which you knew anyway, except you would probably have ymean in place of ymax and ymin .

Extending this treatment to a trend is obvious. Of course, conventional least squares is pointless. With systematic errors dominating, the error distributions (if you want to think this way) are rectangular and there is no unique maximum likelihood line. That is, the likelihood function is the same for the infinitely many lines than pass within the error bars of all data points and zero for any line that doesn't. The obvious choice (which I seem to remember doing before I got sidetracked by statistics) is to draw lines with maximum and minimum slopes that fit within the error bars. Most climate trends would disappear with that type of analysis (ie equal likelihood for lines with positive and negative slopes).

20. davidc says:

I’ve tried to post a better presentation of this (twice) but it hasn’t appeared. Any interest in my trying again?

REPLY: No need. For some reason they were stuck in the SPAM queue. WordPress has some ill defined rules that toss things there sometimes…

21. Dominic says:

Hi EMS

I think you are doing a great job and am tempted to help out (if only I knew where to start).

I do have a query about the accuracy of the averages taken in the GISS data. I do understand that the average of two numbers without any decimal places cannot claim to have a decimal place of accuracy.

However, suppose 1000 observers make a temperature measurement on one day at 1000 different locations across the world. I have 1000 readings. I label each with T(i) where i=1,…,1000. Each reported number is an integer. I cannot know anything more. As a result, the error in each reading is from -0.5 degrees to +0.5 degrees.

However, given that we are averaging over a broad range of temperatures – basically a range of numbers from say -30.0 to +50.0, can we assume that on average, the error is close to zero. In other words, can we assume that the missed decimal place has a uniform distribution – like the RAND() function in Excel. If so, the error in the reading is like a number drawn from a uniform distribution in the range [-0.5,+0.5] which has an average of zero (or close to depending on the number of stations in the average). Using the law of large numbers means that the standard error of the average should fall as 1/sqrt{N} where N is the number of data points. This is approximate since I realise that the assumption that the d.p’s in the temperature readings are random, independent and uniform is not strictly true.

On this basis, might it be argued that the accuracy in the average of a large number of data points is actually better than 1 degree and may indeed be of the order of 0.1 degrees.

22. E.M.Smith says:

Dominic, what you are proposing is something that is properly called “oversampling”. If you have, for example, a chunk of iron and you measure the length. Do that enough times and you can start to average out the errors.

The Problem: With temperatures, you are not measuring the same thing repeatedly. You are not over sampling ONE temperature. You are just repeatedly measuring a large number of different places at different points in time. There simply IS not a “global temperature” to oversample. There are only thousands of different temperatures that constantly change.

For that reason, you are not oversampling, so the precision remains trapped at whole digits.

23. Dominic says:

Hi EMS

I disagree.

I am instead claiming that when we average lots of different temperature readings to produce an average which I call the “global temperature”, the error in the global temperature is lower than +-0.50.

I am not measuring the same thing repeatedly. I am taking temperature readings at different points on the globe ONCE and taking their average.

My contention is that the accuracy of the global average is much higher than the accuracy of any one measurement as the number of data points in the average becomes large.

Here is a thought-experiment which explains why the distributional properties of the error change.

Suppose I have 1000 measurements of temperature from 1000 locations and in these locations I have super accurate thermometers which can give me the results to 10dp. I calculate the Exact Average – lets call it EA. I then round each temperature to the nearest integer and calculate the Rounded Average, let’s call it RA.

I believe that you would argue that the error in the rounded average is +-0.50 so that the error is a uniformly distributed random number in the range -0.50 to 0.50, i.e. the error could be -0.10 or it could be 0.50, or it could be -0.50, or it could be 0.267272 all with the same probability.

IF this is your view then I disagree. If you study the statistical properties of the error, you will find that the size of the error will actually be much smaller than that.

Here is another way to look at it. The only way for the error to be equal to +0.4999.. is if EVERY temperature reading is in the form XX.4999. The only way for the error to be minus 0.4999 is if every temperature reading is XX.5001. What are the chances of that ? Pretty much zero, especially as the number of readings becomes large.

You would find that the probability distribution of the error is no longer uniform, but becomes Gaussian with a variance which becomes more tightly distributed around 0 as the number of measurements in the average becomes large.

For this reason, I claim that there is statistical “self-averaging” (the averaging over many numbers cancels out the rounding error to a large extent) that makes the level of accuracy in the result significant to maybe 1 or 2 dp.

24. Dominic says:

Just to be clear – when I said

“Suppose I have 1000 measurements of temperature from 1000 locations”

I mean that I have ONE measurement from each location.

D.

25. Dominic says:

I have an Excel spreadsheet which “simulates” what I have attempted to describe above. I don’t have a place to upload it to – if you contact me I can email it to you.

26. E.M.Smith says:

Dominic
I disagree.

As well you ought. There is a small “edge case” that I’ve avoided discussing because it sucks people down that road long before they “get it” that the main point is valid, dominates, and that the “short road” they are on is a dead end of little value. (Some, but very little).

That “edge case” is what you are exploring.

It does not apply to world average temperature precisely for the reasons you cover with your premises of your thought experiment.

1) We don’t have enough real thermometers for it to work.

2) We don’t measure them only once in time, then make the average for the globe. GIStemp STARTS with monthly averages of min-max. MIN and MAX come at different times for every site. We’ve already averaged 2 data points per day, then averaged those for each month. And each of those temperature readings was taken at a disjoint time.

3) We violate Nyquist requirments in both time and space to such an extent that even the whole digits are suspect. Given that; a Theoretical Land improvement in at most one decimal point of precision is a pointless exercise in distraction. An exercise that causes most folks eyes to glaze. And does nothing for the non-Nyquist history of temperatures that we have to work with.

OK, but you wish “to go there”.

Where you will end up is that whole degree F raw data (IFF you had access to it) can support about 1/2 to one decimal point of added precision. Maybe. Sometimes. If you did everything perfectly. Which we don’t. (And IFF we had a lot more thermometers in the past, which we don’t and IFF the collection criteria were more stringent, which they were not). Due to not meeting those criteria each reading in our actual history is a disjoint data point for ONE place at ONE time in a non-Nyquist set and can not be used for a statistical approach via over sampling methods. Oh, and the resultant number does not mean much.

But then we throw that theoretical possibility away by immediately making daily MIN / MAX averages for each site that are then turned into monthly averages (further diluting the potential for that theoretical to surface) and THEN we apply a bunch of “corrections” that even further pollute the precision. Only then does GIStemp get a shot at it…

So yes, you have a “theoretical” that is an interesting mathematical game to play; but no, it is not of use in the world today. The data are not suited to your theoretical from the very moment they are collected (fails Nyquist).

OK, the next “issue” is the question of “WHAT average” to use? See:

http://en.wikipedia.org/wiki/Average

For a reasonable introduction.

Are we using the arithmetic mean? Median value? Harmonic Mean? Geometric Mean? Mode? Geometric Median? Winsorized mean? Truncated Mean? Weighted Mean?

Which choice is right?

And what choice was it after GIStemp has added some data points, interpolated some, fabricated a few, and deleted some others? What does that do to the precise requirements for a statistical approach to adding a partial decimal point of precision? At that point it does not neatly fit either the truncated mean nor the non-truncated category. It is a new beast of it’s own construction, undefined in standard statistics and with unknown properties. And unknown, but limited, precision limits.

Now, there is still one other major issue before getting to your theoretical: It is very important to keep it clear in your mind that temperatures are an intensive variable. If you glaze at that in the smallest degree and skip over it without an in depth grasp of it, you will continue to waste time and space on a pointless pursuit of the impossible. Most folks, it seems, do exactly that (give how much bandwidth is wasted on the issue to no avail…)

These folks have a nice short description a couple of paragraphs down:

http://www.tpub.com/content/doe/h1012v1/css/h1012v1_30.htm

What that means, “intensive”, is that one instance of the property for one entity means nothing to another instance of another entity. It is not dependent on the mass of the object.

The taste of MY meal means nothing to the taste of YOUR meal, and averaging them together can be done BUT MEANS NOTHING. Were you averaging in one drop of Tabasco sauce from my meal, or one ounce? It matters to the average, but is not known…

Another example might be taking two different pots of water and averaging their temperature. You get two numbers, but know nothing about the THERMAL ENERGY in the two pots. The temperatures become an average, but the average means nothing. It certainly is not representative of the average thermal energy.

Take the two pots of water and mix them, the resultant temperature is NOT the same as the average of the two temperatures. You must know the mass of water in each pot to get that result. And we did not measure the mass.

The same thing happens on a colossal scale globally. We measure the temperature over a snow field, and ignore the massive heat needed to melt the snow with no change of temperature and ignore the mass of the snow. We measure the temperature of the surface of the ocean and ignore the shallow and great depths. We measure the surface temperature of a forest, and ignore the TONS of water per acre being evaporated by transpiration.

Then we average those temperature readings together and expect them to tell us something about the heat balance of the planet.

That is lunacy in terms of physics and mathematics.

So take all the above, and firmly fix in your mind the truth:

An average of a bunch of thermometer readings MEANS NOTHING.

Got it?

OK, IFF you can get that preamble into your mind and hold onto it, then I’ll “go there” into your theoretical…

(ANY attempt to use the result of the answer to the “theoretical” to revisit GIStemp, AGW, or GHCN issues will be referred back to this preamble…)

I am instead claiming that when we average lots of different temperature readings to produce an average which I call the “global temperature”, the error in the global temperature is lower than +-0.50.

You can get somewhat more precision if you have ever more stringent limitations on how you sample. (NOTE: This is NOT what is done in the real world and has nothing to do with global temperatures as actually measured).

I am not measuring the same thing repeatedly. I am taking temperature readings at different points on the globe ONCE and taking their average.

That is too bad. Repeated sampling is one of the ways to get greater precision. Take 10 thermometers at one place and time, now you have a 10 x larger sample. You CAN average those readings to extend your accuracy and precision (this is done with “oversampling” D to A converters in music applications). You can get to about X.y from thermometers that read in X whole units IFF you oversample “enough” (where enough is more than 10…)

My earlier response was presuming you were trying to tie this to the world temperature record (and GIStemp starts with a set of monthly averages of averages of MIN/Max; so your theoretical diverges dramatically from what is done “in the real world”.)

Now realize, that even if you measure ONLY ONCE, you still have the Nyquist space problem (enough thermometers evenly enough distributed in space) AND you still have the “intensive property” problem. You can say nothing about heat (yet that is what folks inevitably try to do).

My contention is that the accuracy of the global average is much higher than the accuracy of any one measurement as the number of data points in the average becomes large.

The key point here is “becomes large”. Large must be very very large, and we are not even near the Nyquist limit in the real world. You need more than Nyquist to start getting more precision (and remember that the precision tells you nothing of meaning.) So in Theoretical Land, with some millions of thermometers, you can get to X.y from a measured X. (If all are taken at the exact same instant in the time domain.)

I don’t see where this point is germane to what is done with AGW arguments, GIStemp, GHCN, etc. The “theoretical” is so far from reality it is like arguing about angels fitting on pin heads. An interesting mathematical game, but not of use. We don’t even get near Nyquist in either time or space. We measure random places (especially changing over time). We measure them at near random and disjoint times. We then use a randomly selected averaging technique to average some of them together and THEN we start to use GIStemp to fudge the data even more.

The difference between Theoretical Land and Reality makes believing in Angels the easier choice 8-)

Here is a thought-experiment which explains why the distributional properties of the error change.

Suppose I have 1000 measurements of temperature from 1000 locations and in these locations I have super accurate thermometers which can give me the results to 10dp. I calculate the Exact Average – lets call it EA. I then round each temperature to the nearest integer and calculate the Rounded Average, let’s call it RA.

Yes, this will give you a very precise number. One Small Problem:

You are working from the rounded data from a very precise starting data set. The real world data are not precise and have accuracy limitations in the instruments as well. The properties of your RA will be far different from the properties of the averages from the real data. Your data have a theoretical basis of high precision, the real data have a basis of low precision. The statistical distribution of their error bands will not be the same. This is the major flaw in your argument, IMHO.

Another Small Problem: 1000 thermometers will give you very precise readings that DO NOT represent the reality around them. You will still have ACCURACY issues. And your scale is still low by several thousands (or millions?) of thermometers…

Take a meadow with a stream through it. Just around the corner in the shade, snow is melting into the stream. A dark rock on the edge of the stream in the sun is warmed by the sun. The snow is 32F, the creek is 33F, the stone is 115F on the surface (and 60F in the deep interior), the air over the meadow is 85F and the grass blades are about 75F (they are transpiring and evaporating water). Where do your place your thermometer to get an ACCURATE reading of “the temperature” of that place?

The answer is that you can not.

Each piece has it’s own intensive property of temperature. We hope that putting a stevenson screen with a thermometer somewhere in the meadow at about 5 feet up will give us some kind of “accidental average”, via the air, of the various surfaces in the meadow. Good enough for us to know if we need to wear a coat (since we are in that body of air) but quite useless for saying anything much about heat balances. I learned this quite dramatically in that meadow when, sweating in the sun, I dove into the stream and shot right back out a nice light blue color!

So yes, you can repeatedly average your average of averages of accidental averages and get ever greater precision, but no, you can’t get more accuracy, since at it’s core, the notion of an “accurate temperature” for any given size cell is fundamentally broken. You can only get a truly accurate temperature for a single surface of a single thing. It is an intensive property of that thing.

Further, you can not take a temperature and use it to say anything about heat or energy balance (the important issues for “climate change”), though people try. Temperature without mass and specific heat is useless for heat questions.

So as soon as you step away from that thing, you get to answer such questions as ‘how was the heat flowing?’, ‘what are the relative specific heats?’, ‘what phase changes happened’? And since we ignore those, we have no idea what “the temperature” means.

I believe that you would argue that the error in the rounded average is +-0.50 so that the error is a uniformly distributed random number in the range -0.50 to 0.50, i.e. the error could be -0.10 or it could be 0.50, or it could be -0.50, or it could be 0.267272 all with the same probability.

IF this is your view then I disagree. If you study the statistical properties of the error, you will find that the size of the error will actually be much smaller than that.

For your theoretical case of a gigantic number of thermometers read at exactly the same time.

Not for the real world. For the real world, we measure each high and each low ONCE at each place. That’s all we have.

For the real world, there is no possibility of a Gaussian distribution of error, since we have only one temperature of one thing at one point in time. From that point on, we can make no statement about improved accuracy from improved precision other than to say that your accuracy is limited to whole degrees F so any precision beyond that is False Precision.

The disjoint measurements in time, the sample size of one for any place, the non-nyquist distribution of places, they all say: You can only truncate at whole degrees F any calculations you make.

There was no “one thing” repeatedly sampled (in time or in space) to support an “oversampling” argument. And that is what your argument is. That an over sample can extend accuracy. It can, by a VERY limited amount. But the data we have do not conform to the requirements for a statistical over sample approach. So your theoretical is, and must remain, a theoretical that has no bearing on the real world and the issue of “Global Warming” (and how GIStemp works).

Here is another way to look at it. The only way for the error to be equal to +0.4999.. is if EVERY temperature reading is in the form XX.4999. The only way for the error to be minus 0.4999 is if every temperature reading is XX.5001. What are the chances of that ? Pretty much zero, especially as the number of readings becomes large.

Except that in the real data, being that the error is not a statistical artifact of rounding from true and accurate data, there is no basis for any particular expectation about the nature of the data that do not exist. The fractional part that you are discussing does not exist.

So “Pretty much zero”, is quite possible. And that is the whole point behind knowing where your accuracy ends. You DON’T KNOW what the “real” average is in that range. You get to GUESS based on probabilities IFF you have the basis for it.

IFF you have a tightly constrained set of initial conditions, you can use a statistical approach to tease out a bit more “significance” via that probability analysis. If you do not have those tightly constrained initial conditions, you can not make such a statistical GUESS.

We don’t have the initial conditions to support that approach in the thermometer record of the planet. Even your theoretical falls short (1000 is no where near enough).

You would find that the probability distribution of the error is no longer uniform, but becomes Gaussian with a variance which becomes more tightly distributed around 0 as the number of measurements in the average becomes large.

And again, we are back at the point that “large” must be very very large. My guess is on the order of sampling every square meter of surface (but no one knows for sure). The surface properties of the planet are rather fractal and that introduces some “issues”. Is a black beetle on a white marble paver in a green garden with brown dirt patches accurately being “averaged” in with a 1 M scale? No. Are the billions of beetles on the planet significant? At what point do you put a number on “large”?

And just a reminder: This still ignores the problem of averaging a bunch of intensive property measurements being a meaningless result.

For this reason, I claim that there is statistical “self-averaging” (the averaging over many numbers cancels out the rounding error to a large extent) that makes the level of accuracy in the result significant to maybe 1 or 2 dp.

You can’t get to 2 with your theoretical (and it would be challenged to get to part of 1 dp. You need a lot more thermometers.) The math is beyond the scope of this article (maybe I need to add a specific article on this point. It seems to be a quicksand trap that everyone loves to go to for a picnick…) In the real world data, they are challenged to support whole degrees of F and the way they are handled prior to GIStemp makes the 1/10 F completely fantasy.

27. Dominic says:

Hi EMS

Thanks for your reply. It seems as though we are still in disagreement as you state

“For the real world, there is no possibility of a Gaussian distribution of error, since we have only one temperature of one thing at one point in time. From that point on, we can make no statement about improved accuracy from improved precision other than to say that your accuracy is limited to whole degrees F so any precision beyond that is False Precision.”

Let me ignore the details of the GISStemp process for the moment and just let me assume that it is a simple average of a large number of integer rounded temperature readings. I know that it is much more complex in practice, but if we can’t agree on this simple case, then we will never get anywhere on the true case and I will keep quiet!

There is clearly some lack of precision in the measurements of single temperatures. Someone somewhere is rounding the min/max temperature to the nearest degree. This can be the observer reading the mercury by eye and going for the closest temperature marking, or an electronic measurement being rounded or whatever. This loss of precision creates an uncertainty in the exact temperature. I have decided to call this “uncertainty” NOT error. Why? Because if I asked for the data to be refreshed 1 nanosecond after the first reading, I would not expect it to change. The distribution of uncertainties comes from different locations.

The distribution of the uncertainty could be captured by plotting a frequency histogram of the uncertainty coming from each location. Unfortunately I cannot calculate this since I am only given the rounded temperature. But if I could, I would contend that it would have a fairly flattish uniform like distribution between -0.5 and +0.5.

This is why I claim that the uncertainty in each reading behaves like a random uniform variable which can be in the range [-0.5,0.5]. Averaging temperatures averages this uncertainty and the uncertainty in the final average is lower than that in any single measurement. This is the simply the law of large numbers.

I hesitate to get into the other issues you raise. I have a background in statistical physics so I can understand and agree with much of what you say about intensive/extensive. However they are different issues to the fairly narrow technical one I am raising.

28. E.M.Smith says:

Dominic Let me ignore the details of the GISStemp process for the moment and just let me assume that it is a simple average of a large number of integer rounded temperature readings.

It isn’t.

That is the first place where we hit a wall.

First, take the word GIStemp out of your statement.

Long before the data reach GIStemp, they stopped being simple temperature readings. You are not talking about GIStemp at all. (I’ve actually pondered making a posting dedicated to this dead end that so many folks seem to want to explore, just to move all these comments OUT of the GIStemp thread, since it has little / nothing to do with GIStemp…)

We must go “upstream” to NOAA to get simple temperatures.

Now, if we change your statement to be “… NOAA … let me assume that it is a simple average of a large number of ineger rounded temperature readings” We hit the next wall:

It isn’t.

We know the NOAA process. They take the MIN temp for the day (a single reading of a single thing at a single time) add it to the MAX temp for the day (another single reading for a single thing at a different single time) and divide by 2. It is not clear if they do an integer divide or a floating point divide, but what they OUGHT to do is truncate at no digits after the decimal point or perhaps round those digits. The fact that they present the monthly mean data item with precision of 1/100 F leads me to think they do a “float” all the way; and that is clearly wrong.

So already we are limited to an average of exactly 2 thermometer readings. They have a known uncertainty of +/- 0.5F and there is nothing that can be done about that.

They take these “daily averages” and sum them for each day of the month, then divide by the number of days in that month (for which they have data).

So now we are taking up to no more than 31 items (that are already averages with a wide uncertainty of +/- 0.5 F) and averaging them together. That gives the Monthly MEAN Average. Only then is that data item handed to GIStemp.

I know that it is much more complex in practice, but if we can’t agree on this simple case, then we will never get anywhere on the true case and I will keep quiet!

I could play the Hypothetical Land game all day, but frankly, it isn’t of any USE. It isn’t what is done, and it has lethal failings. That is not a very good use of my time…

So yes, in Hypothetical Land, with gigantic numbers of thermometers, all read at exactly the same moment in time, and all of them rounded to perfection, and IFF all that raw data were available to you, you could extrapolate a tiny bit more precision from the average (if you didn’t use the procedure that is used) by using statistical techniques. That would be about 1 decimal point if you are lucky (pun intended ;-)

But we are not there, there aren’t, they aren’t, people don’t, and it isn’t; so we can’t.

There is clearly some lack of precision in the measurements of single temperatures. Someone somewhere is rounding the min/max temperature to the nearest degree.

We know exactly how it is done. NOAA posts the directions (and it is given as a link in the “Mr. McGuire would not approve” posting, where all this really belongs…) The directions are, IIRC: {read the thermometer that is in 1/10 F and write down the value. ROUND it to the nearest integer (with .5 going up and .4 going down). Write that on the paper that you send in. If you have no valid reading, you may guess one and fill that in.}

Yes, some of the official recorded data points are flat out guesses. Care to guess what their accuracy might be?… What they do to your distribution of error?

This can be the observer reading the mercury by eye and going for the closest temperature marking, or an electronic measurement being rounded or whatever. This loss of precision creates an uncertainty in the exact temperature.

And you will have no idea if any given 72 F reading was 71.5000 or 71.4999 or 72.00000 or any points in between them; so all you can assume is that there is an exactly equal probability of any of them. You can know nothing beyond 72 +/- 0.5 F. And any average of those numbers can tell you nothing other than +/- 0.5 F.

I have decided to call this “uncertainty” NOT error. Why? Because if I asked for the data to be refreshed 1 nanosecond after the first reading, I would not expect it to change. The distribution of uncertainties comes from different locations.

No, it does not. 1 nanosecond later an electronic sensor might well have changed from 71.49999999999 to 72.50000000; and a person might well look at 71.4(blur almost to line) and the second time see 71.5(nothing). It may not happen often, but when it does, it moves you one whole degree F. And you don’t know what stations were in that state, and what stations were not, nor which way they moved.

The simple fact is that “at the line” will with nearly unpredictable frequency fall one way or the other. There is no “rounded edge” or “stability island” or any of a dozen other concepts to bring to the process what is not there.

I *look* at a thermometer twice and I *see* something different each time. Maybe it’s parallax, or maybe it heated up a bit, or cooled off, or I just blinked and changed the tears in my eyes. You could make a whole science out of what folks do in reading a thermometer twice in a row, but the “at the limit” cases will NOT be stable. (I know this partly from trying to do chemical photography and staring at the *#\$@ thermometer, waffling back and forth about is it REALLY 72 or just a smidge off…) And that is, IMHO, why NOAA has folks read the tenths, then round and toss them out. They know they don’t have accuracy in the tenths place.

The distribution of the uncertainty could be captured by plotting a frequency histogram of the uncertainty coming from each location.

The uncertainty from each location disappears into One Single Number. 72. There is no information available to tell you is it uncertain high, low, or exactly on. Was it 1/1000 F away from becoming 71 or 73? You just do not and can not know. Period. Ever. By any means.

And if the individual station datum is opaque, then there is no way to get a distribution of uncertainties from a collection of them. Each one has the exact same uncertainty. +/- 0.5 F and there is no “distribution” to be found.

Averaging a bunch of them together from many places will NOT improve the situation. Was that 71 you “averaged” in reality a 70.500001 or a 71.4999999 ? You can’t ever know.

ALL the individual uncertainties are exactly 1F. Take 10 x 1 F and divide by 10, you get 1 F. Take 10,000,000,000 x 1F and divide by 10,000,000,000 and you get 1 F.

Unfortunately I cannot calculate this since I am only given the rounded temperature. But if I could, I would contend that it would have a fairly flattish uniform like distribution between -0.5 and +0.5.

IFF you had access to all the data as 1/1000000 F data, THEN you could calculate something interesting about the rounded eyeball values COMPARED to that high precision data. But you don’t have that data. The precision EVAPORATED at the time of rounding. Gone. POOF! Vanished. LOST. DEAD and BURIED.

This is why I claim that the uncertainty in each reading behaves like a random uniform variable which can be in the range [-0.5,0.5]. Averaging temperatures averages this uncertainty and the uncertainty in the final average is lower than that in any single measurement. This is the simply the law of large numbers.

Only if you have multiple samples of a SINGLE THING. For independent trials on independent things YOU CAN KNOW NOTHING ABOUT THE STATISTICAL DISTRIBUTION OF THE DATA THROWN AWAY IN ROUNDING. There ought to be as many 71.49999 as 72.49999 as 71.50001 as 72.50001 . Some times you get 71. Sometimes 72. Sometimes 73. You can’t look at the 71s, 72s, and 73s and figure out which was from a 72.49999 and which from a 71.500001. Ever. Nor how many there were of each.

Take 10 readings of 72. They might have all been from 71.5 F sites and add to 715. That would be 5 F “below” 720. Or they might be 724.99 and 5 F “High”. Then divide by 10 and you are back at +/- 1/2 F “error band”. You can NOT say that it was 720 -/+ {something smaller than 5}. You can say it is UNLIKELY that the true sum was 715, but it is just as unlikely that it was exactly 720. The original thermometers actual readings are unknown and unknowable, even as a statistical distribution. IFF you are measuring the same thing, you can assert that the numbers ought to have a central tendency and that the outliers ought to have a normal distribution. Not so for a collection of unrelated things.

Look, you are just re-inventing “oversampling” and Nyquist. We know their limitations. This isn’t new and isn’t in doubt. And it can not be applied to independent trials in independent events. It can only be applied to a single thing being repetitively measured.

I hesitate to get into the other issues you raise. I have a background in statistical physics so I can understand and agree with much of what you say about intensive/extensive. However they are different issues to the fairly narrow technical one I am raising.

At the risk of sounding brusque (due to flogging this particular horse into cat food about once a week for a couple of years now in different places): Please review oversampling, Nyquist, and False Precision in calculations.

If you look here:

http://www.physics.unc.edu/~deardorf/uncertainty/UNCguide.html

You will see that they only talk about distributions and show a standard normal curve when talking about repeated measurements on a single thing. They have a list of the standard rules for handling significant digits, precision, and uncertainty. None of them include your approach. Maybe you can invent a new thing, but it is not the way it is done now.

Further, you will find almost identical wording on the standard way to handle precision and uncertainty in Chemistry…

This issue seems to be one of those “Shiny Things” that folks just can’t resist staring at…

29. Dominic says:

I PREFER NUMBERS TO WORDS SO I WILL USE YOUR NICE THOUGH EXPERIMENT. ALSO, SORRY ABOUT THE CAPS BUT I DO NOT KNOW HOW TO COMMENT OUT TEXT SO HOPEFULLY THIS SHOULD DISTINGUISH MY TEXT FROM YOURS.

YOU STATE

“Take 10 readings of 72. They might have all been from 71.5 F sites and add to 715. That would be 5 F “below” 720. Or they might be 724.99 and 5 F “High”. Then divide by 10 and you are back at +/- 1/2 F “error band”. You can NOT say that it was 720 -/+ {something smaller than 5}. ”

YES – I AGREE 100%. THE ERROR LIMITS ARE +-0.50

“You can say it is UNLIKELY that the true sum was 715, but it is just as unlikely that it was exactly 720. ”

NO – YOU NEED TO CONSIDER COMBINATORICS. THE QUESTION IS WHETHER IT IS JUST AS LIKELY TO HAVE A SUM OF 715 AS A SUM OF 720 WHEN THE REPORTED OBSERVATIONS ARE 72 FOR ALL SITES AND THE TOTAL IS 720. THIS IS THE CASE IN WHICH THE “ERROR” IS ZERO.

WHAT IS THE PROBABILITY OF THE ERROR BEING ZERO IF ALL THE ROUNDED READINGS ARE 72?

WELL IT CAN HAPPEN IN MANY MANY WAYS. WE MAY HAVE READINGS OF 71.5, 72.499, 71.5, 72.4999 ETC… OR 71.8, 72.2, 71.8, 72.2, ….OR 72.0, 72.0, 72.0,…. AND SO ON. IN ALL OF THESE CASES THE TRUE AVERAGE IS 72 AND THE REPORTED AVERAGE IS 72.

COMBINATORIALLY AS THE NUMBER OF LOCATIONS OVER WHICH WE AVERAGE BECOMES LARGE, THE PROBABILITY OF ZERO ERROR BECOMES MUCH MUCH GREATER THAN THE PROBABILITY THAT THE ERROR IS +-0.50.

OVERALL WE WILL FIND THAT AS THE NUMBER OF LOCATIONS BECOMES LARGE THE ERROR IN THE AVERAGE HAS A GAUSSIAN DISTRIBUTION WITH VARIANCE 1/SQRT{N}

THESE COMBINATORICS ARE AT THE ROOT OF THE CENTRAL LIMIT THEOREM WHICH I AM USING HERE. I AM NOT INVENTING ANYTHING.

OF COURSE, THIS ASSUMES THAT THE ERRORS IN THE DIFFERENT LOCATIONS ARE IID. THIS IS CLEARLY NOT EXACT, BUT PROBABLY A REASONABLE APPROXIMATION.

30. Jeff Alberts says:

I *look* at a thermometer twice and I *see* something different each time. Maybe it’s parallax, or maybe it heated up a bit, or cooled off, or I just blinked and changed the tears in my eyes.

Or the proximity of your body or breath heated it up a smidge, or you blocked the sun for 20 seconds, causing it to cool. Yeah, quantify that.

31. Dominic says:

EXTRA COMMENT:

ALL MEASUREMENT STATES ARE EQUIPROBABLE.

AS A RESULT THE PROBABILITY OF A CERTAIN ERROR DEPENDS ON THE NUMBER OF WAYS THAT ERROR CAN BE GENERATED CONSISTENT WITH THE ACTUAL ROUNDED READINGS.

I OMITTED TO SAY THAT THERE IS ONLY ONE WAY THE ERROR ON THE AVERAGE CAN BE +-0.50. ALL READINGS MUST BE 71.5 OR 72.4999.

THERE ARE LOTS OF WAYS THE ERROR CAN BE ZERO OR CLOSE TO ZERO. AS THE ERROR MOVES AWAY FROM ZERO THE NUMBER OF WAYS THIS CAN HAPPEN DROPS OFF.

PS I STILL HAVE THAT SPREADSHEET IF YOU ARE INTERESTED.

32. E.M.Smith says:

Dominic,

We must simply disagree. I have work to do that matters, and frankly, playing a dice rolling game with numerology does not rank very high.

You wish to believe that you can create some knowledge or information out of the ether of the averaging of a bunch of digits. Fine. Believe it. I choose to side with all the chemistry and physics classes I’ve ever taken, all the “precision in the lab” guidelines (like the one linked above), etc.

At the point where you are handed a bunch of integers, you are stuck with the fact that the information to the right of the decimal point is gone, and you can not get it back. No matter how many dice you roll.

At the point where a human being eyeballed a thermometer and wrote down a digit, you can not create more precision in what they did. They have an error. The instrument has an error. The placement has an error. The transcription has an error. The reading may not be a reading at all and my simply be a GUESS (yes, the rules let you do that.) You can make no statement about what would happen if 10,000 people made that reading and you averaged them all together. You get ONE reading. Period.

Each trial is independent. Each coin toss is new. Each reading unique. You can make no statement about a central tendency meaning that the 72 was really 71.5 or 71.9 or 72.4 it is only 72. The 10 independent trials give 10 independent numbers that have no relationship one to the other. You can not undo that by working backwards from the average to what the distribution ought to have been based on combinatorics, because next time you might well have 17, or 99. The “space” is not limited to 71.50000 to 72.49999 since it COULD have been 0, or -49, or 109. It just happened that THIS TIME it landed in the zone near 72 and got rounded to that value.

ALL you have is a very large pot of very random numbers, and averaging them together does not make them any less random in any of their decimal places.

BTW, to mark quoted text, there are a couple of easier ways than all caps. The “old school” pre HTML ways were to either put in “QUOTE” “END QUOTE” text markers around blocks of text, or to put some kind of character as the marker at the front of each line like:

}quote
}from bob

The modern, HTML way, is to use meta text markup language tags. So to make italics you would use an OPENANGLE BRACKET i CLOSEANGLE BRACKET or to make bold you do the same thing with “b” instead of “i”. The end is marked with “/i” or “/b” between similar brackets.

<b>Gives Bold</b>
<i>Gives Italics</i>

33. Dominic says:

OK. This is getting quite time consuming and I also have work to do. Before I go I will make one comment. You state

ALL you have is a very large pot of very random numbers, and averaging them together does not make them any less random in any of their decimal places.

I agree.

Averaging them together does not make them any less random, but it does make the average less random. That’s the central limit theorem. That’s all.

Dominic

34. Tony says:

A fascinating discussion on the significance of the decimal fractions in the ‘average’ of a set of integer numbers.

It seems that a paradox is involved, and like optical paradoxes they can be discovered when the point-of-view or perspective is changed. And so other ways of looking at the problem could help in this case.

A thermometer reading to the nearest degree turns a continuous property into counted total of a number of discrete objects, like say counting the number of spectators at a sports stadium. Now quantisation into discrete integer units requires the application of the law of the excluded middle. In our example a yes/no quantising decision is needed on every person to decide whether they were in or not in the stadium, and the click of the turnstile does it for for us at the stadium.

If we want to know how many people were at stadiums across the country, the only way of doinmg it without loss of information is to add up all the totals.

If someone then wants to convert this number to percentages, averages, etc., via arithmetic division, then they will need to preserve ALL of the digits of the answer to preserve the information content. IF they truncate, they have performed another quantisation, and information is lost.

And in terms of information, truncation is an irreversible operation; the truncated information is irretrievably lost and cannot be recovered.

35. E.M.Smith says:

@Tony: You got it. The original US data is in whole degrees F and from that moment on, any math done with it restricts all other valid information to whole degrees F, so any global average, even if stated in C is restricted to 5/9 C of precision due to the whole F truncation at initial data gathering for a large number of the thermometers (even if a small percentage of the planet surface, it’s the number of data items that matter here).

36. Peter says:

Thanks EM. My first and best chemistry teacher drilled significant digits into my brain to such a degree, I could not forget it if wanted to. Global average temperature reported to two decimal places, or three for Hadley, for goodness sake, was what four years ago started me on this journey to skeptic land. This blog is a fascinating stop on the journey.

37. Peter says:

EM,

An interesting post. I cannot tell you the number of times I have questioned significant digits in various climate papers only to recieve the obligatory armwave about the law of large numbers. I have never been able to have it explained to me how new information can be created in this way, and the short answer which you provide above is it can’t. May I copy and paste relevant sections (with attribution) if needed?

REPLY: “You have my permission to copy and use the material here in any attempt to disabuse folks of the ‘global warming’ notion and / or to correct their understanding of precision. -ems”

38. BillM says:

Peter,
I think that the issue here is not one of stats but one of assumptions.
You see the monthly average as an average of different things. Individual measurements, one for each day, of the different temperature for each day.
The climate folks do not see it this way. Climate folks are not interested in either the day by day variation at a site or the site’s absolute temperature on each of those days. They assume that for one thermometer, all temperature variation in a month can be regarded as irrelevant weather noise.
So they regard the 30 readings as 30 readings of the same thing. They see only one actual value for the month, not 30 different values. Hence, they claim 30 readings of the same thing, hence the law of large numbers in the average.
And the climate folks only work in anomalies compared to a baseline. They assume that the variations in 30 years of base period are noise, irrelevant to any later trend signal that we seek to identify. Hence there is only one, 30-year anomaly baseline actual value for a month, which is claimed to benefit from 30 x 30 readings of any one thermometer. 900 readings to which the law of large numbers is now applied. So they say on realclimate.
I am NOT defending this approach. I am simply suggesting that the climate folks would see the significant digit issue being of less relevance when viewed with these assumptions. (Don’t shoot the messenger, I’m with EMS, and you, on this one).

REPLY: “I think you summed it up rather nicely. And that, at it’s core, is the nature of their precision problem. They forget that a month may have a significant trend, day over day, that averaging as a single thing ignores. They forget that a century may have a significant 60 year cyclicality in it and that a 30 year baseline is just hiding that ‘ripple’ which they later ‘discover’ as their anomaly. They create their own self confirming fantasies because they do not understand the tools they are using. Not computers. Not thermometers. And not even math. -ems “

39. Peter says:

Bill,

That seems likely, but also willfully blind, and I take your point that it is not your position. I guess willful blindness is useful in reference to things like GIStemp. GMST in 1900 according to GIStemp 13.9 C. GMST in 2000 14.33 C.
Mr. Climate Scientist, what is the delta in GMST over the 20th century?
0.44 C Mr. Skeptic.
Mr. Climate Scientist, what is the delta if you respect the scientific rules regarding instrumental precision and significant digits?
I’ve no more time for you, denier.

Mr. Schaffner rolls his eyes.

40. E.M.Smith says:

Peter
GMST in 1900 according to GIStemp 13.9 C.
GMST in 2000 14.33 C.

Take a look at the last few “by country” or “by continent” postings:

http://chiefio.wordpress.com/2009/11/03/ghcn-the-global-analysis/

Look at the bottom total averages of the temperature series. They are different from each other.

I put them there deliberately to illustrate the degree of change of the “average” depending on the order chosen to to do the average. (Average all daily data / divide by days; vs: Average daily data in to monthly averages, then average the monthly averages). In some cases, like Antarctica, you can get 7 C of variation in the answer. (Note: there is no decimal point in that number, it is 7.0 whole degrees of C based on order of averaging…)

That is part of the problem with “serial averaging”. Not only is your precision limited to your original data precision; but you can cause your results to wildly swing based on which of 2 reasonable choices you make …

So just WHICH GMST does one choose…

41. Peter says:

EM,

That Antartic fact alone should put to rest the validity of any mean temperature digit to the right of the decimal. It just doesn’t mean anything.

42. davidc says:

Re:
Tony

“It seems that a paradox is involved, and like optical paradoxes they can be discovered when the point-of-view or perspective is changed.”

The point of view of EM and everyone’s chemistry teacher is (in the language of statistics) one of confidence intervals. The traditional method arrives at a 100% confidence interval. For a rectangular distribution this is set by the precision of the original measurement, just like EM says.

But statisticians don’t discuss 100% confidence intervals because for most distributions these are infinite, and so of no use. That’s why they consider 95%, 99% etc CIs – because they can actually calculate a number. The central limit theorem does apply to the mean of random variables from a rectangular distribution and for enough observations a 99% (say) CI will decrease as the number of observations increases – but the 100% CI stays put. The reason is easy to see: to get in the 1% between 99% and 100% all the observations have to be at that end of the original distribution, and the probability of that gets smaller as the number of observations increases.

Personally, I would like to have a 100% CI if this is the basis for shutting down civilisation, which I quite like.

43. Just for the record — not that it is likely to do any good — Dominic is clearly right, and you are wrong, about significant digits. He’s done a much better job of explanation that I can hope to do with the time available.

Sorry, EMS, but you really do need to find someone you trust who can explain to you why 100 measurements each having 2 or 3 decimal digits of precision can indeed have an average which has 3, 4, or 5. The exact precision depends on the details of the measurements; http://en.wikipedia.org/wiki/Bayesian_probability is but a small pointer to the full answer.

I am sad that whoever taught you (and Peter) about the importance of significant digits did such a good job that you thought there was no more you ever needed to know on the subject. Knowing what you don’t yet know is a valuable skill if you hope to discover truth.

REPLY: [ And I'm sad that you have, like so many others, gone off into this red herring argument again. (Guess you don't like horses much... OK, we'll flog this one again for the folks who could not read the flogging already done above...) What you say is true if you are measuring A thing but it is not true if you are measuring a bunch of different things. THAT is the core disagreement, not the math. I'm familiar with statistics, Nyquist, et.al.

There is NO average monthly temperature thing to measure. There are only 60 discreet high and low temperatures. Each measured exactly ONE time. There is no probability distribution. There is no Bayesian conditional knowing with future observations adding more knowledge. There is no statistical fine tuning of the small bits. There is only, and exactly, ONE observation and ONE data point for ONE thing: The temperature at that time.

The averages are an artificial construct, not A Thing being measured with multiple samples. Averaging the discreet observations together does not remove their individual errors, where 60 measurements of A thing can. If you over sample A Thing with multiple measurements, everything you and Dominc have said is true; but temperature data fails the "IF" part of that statement. Basically, the weather is constantly and chaotically changing so each day is a new event, unlike measuring the distance between two towns that ought to be the same thing from day to day or the probability that the prize is behind door number 3 after 2 was picked.

Or put another way: 10 thermometers in ONE calorimeter will let you get a better precision of the temperature in the calorimeter than the individual thermometers; but 10 thermometers in 10 calorimeters will give you 10 readings limited in precision by the individual thermometers. If the next day you set up another 10 calorimeter experiments with the 10 thermometers and read them all one time again, they do not improve your prior days readings. And while you can calculate an average to a great deal of precision, it is without meaning in those small bits. These are independent events. Each day, and each location, is one thermometer in one calorimeter experiment. "And tomorrow is another day..." (with apologies to "Gone With The Wind" fans ;-) -E. M. Smith ]

44. j ferguson says:

E.M.
You provide a wonderful education here.

When I was a kid, my dad used to point out graphic data representations in the weekly news magazines and ask me to tell him how they didn’t portray the data. Usually it was line graphs showing annual time series where there could be no meaning whatever to the position of the line between years.

Later, I learned that there was a discipline devoted to graphing data and choosing a representation technique that was kind to the information.

Maybe this is another lost art.

REPLY: [ I don't think it's lost, just wounded... ;-) -E.M.Smith ]

45. Nick Barnes says:

I spoke to an professional statistician about this, and he told me that you were blowing smoke. There is a thing called False Precision – which I was also taught at school – but this isn’t it. The average of a set of imprecisely-measured values can and does have more precision than the original measurements. This fact is critical to all sorts of science, engineering, and quality control. Measurements always have limited precision; if you want to get a more precise result, take lots of measurements and average them. It’s especially useful to eliminate noise. End of story.
For a good example, see Tamino’s article about measuring light curves of variable stars. It’s a very pertinent example: one is assuming a cyclical underlying phenomenon, taking many imprecise measurements, and averaging them over many cycles to more precisely approximate the true phenomenon. It’s a tremendously successful technique.

REPLY: [ Still didn't read the above comments, eh? As I've said a few dozen times before. You don't get 'a large number of measurements to average'. You get TWO. Daily MIN and MAX. After that you are playing in the land of averages of averages of averages. (And even then you don't get "a large number", you get 10 to 30 at a time in all sorts of places in GIStemp, to average.) So please, don't waste any more of my time with re-ploughing well ploughed fields. A simple example: I have 2 glasses of 500 ml each. Each is 1/2 full plus or minus 1/2 of a glass. I can NOT average them and say I have 250 + 250 / 2 = 250 ml EXACTLY. I can have anywhere from 0 ml to 500 ml. If I had a million such glasses, yes, I could do something. But I don't. I have exactly TWO when the first average is done.

THAT is the basic error of NOAA that then gets propagated on. You do not have 1/100 F precision. You didn't get to average 10,000 measurements. Throwing away the error bars is just False Precision. -E.M.Smith ]

46. Nick Barnes says:

Nick Barnes // December 17, 2009 at 2:11 pm
Yes. So what do we do about skeptics? The noise machine is very powerful and does continually generate doubt and misunderstanding in honest people. A long-standing policy suggestion, which some people seem to use effectively is this:
[...]
2. Every time a skeptic, real or faking, troll or not, shows up saying “I heard it was the sun”, or “what about the trick to hide the decline”, or “CO2 lags temperature”, [...]
the only response should be this:
“This is a frequently-asked question. See *here*. For more information, see *list of rebuttals*. If you have related questions, please take them up there.”
Just boiler-plate. No conversation, no dialog.

Since you seem to be intent on playing “strategic inside baseball” and not on just being a regular joe, I’ve added you to the SPAM filter. I’ll still look at your stuff, but to the extent it just looks like part of a strategy (say, oh, to waste my time or annoy) it will just get deep sixed. I don’t need to provide a platform for organized propaganda efforts. -E.M.Smith ]

47. Joshua W. Burton says:

So, a standard college (or advanced high school) physics lab exercise is the double-pulse measurement of muon lifetime. A cosmic-ray muon comes to rest in a scintillator, giving off a visible flash which is captured by a photomultiplier. A few microseconds later it decays, and the electron decay product gives a flash of its own. Students time the interval between the flashes, typically (with 1970s equipment; kids today often have better electronics) to the nearest 0.5 microsecond.

Each muon is only measured once, yet the muon lifetime is correctly calculated, after binning a few tens of thousands of measurements, to an accuracy of 0.01 microsecond. The students who get A’s always agree closely with the PDG accepted value of 2.197 microseconds.

It seems to me that this is very similar to the averaging procedure used by GISTEMP: many events, with unknown true continuum values, measured just once each at low precision, yielding a much higher cumulative precision because of the Central Limit Theorem and the fact that the individual errors are not correlated. (All these criteria apply to the thermometers as well.) I would like to go on teaching this lab, so if this is “False Precision” I suppose I must be in favor of it. I’m willing to come back in April to let you know if my QuarkNet kids get the right answer once again.

REPLY: [ Once again we have the same "swing and a miss" due to not reading the already well hashed material above. Maybe I need to rework the original posting to put this at the top of it...

BTW, the problem starts in NOAA / GHCN, not GIStemp. GIStemp takes it further, but you are making assumptions about what it does that are not true. GIStemp does not get temperature readings as it's input, it gets an artifical construct, the monthly mean of daily means for each location. So, looking at the last time we actually see a data item that is a temperature, here is the problem with NCDC / GHCN:

OK, for the few hundredth time:

You do not have a large number of measurements being averaged, you have 2. The daily MIN and MAX. After that, you are averaging "daily means". Last time I looked, the "mean" was a calculated constructed artifact, not "a thing". The emphasis here is not on each day being some how "not a thing" but on a "mathematical construct" being "not a thing". Further, as weather is chaotic, each day is quite disjoint from it's neighbor. (Heck, I've had 50 F drop in a few hours in Dallas as a Cananda Express rolled over us.)

It is absolutely NOT like a muon which is a real physical thing with real physical CONSTANTS you are trying to measure. So yes, the day has a "mean value", and if you used 10,000 thermometers at that place and time you could find it to great precision with the central limit theorem. But you don't. You get one shot at it on one day with at most 2 data points for that location. Then you average them. What precision can you get out of averaging 2 data points each measured to 1 F of precision? That's what you've got.

You have a non-nyquist sample of the planet and for each location you get at most 2 values that are promptly averaged (per the NOAA / NCDC page describing their process) for that location. Then you take at most 31 of these non-physical averages and average them together to get a "monthly mean" for that location. Not 10,000. Not even 1000. Not even 100. It is the "monthly mean" and the "daily mean" that are "not A thing" but are a calculated construct. Calculated in little tiny steps to which the central limit theory does not apply.

(Frankly, if they took all the MIN and MAX data points for the planet and averaged THEM as a group, my whole false precision complaint would evaporate, but they don't. We've got this stepwize thing to deal with instead. For an average of all the MIN and MAX values averaged as a group, the only issue is see is Nyquist sample theory and maybe the fact that they are taken at different times during the day, but that's a hypothetical about something they don't actually do.)

So take two glasses of water, each half full (plus or minus 1/2 full on the precision of the measurement). Average those two reading and throw away your error band. Tomorrow I'll fill another two with no specific relationship to how I filled them today. I'll do this 30 times. The amount of water I put in each day is disjoint from the amount I put in on prior days. Tell me to 1/100 th of a glass how much water I have when I dump them all in one bucket at the end of the trial...

That illustrates the problem. You don't have 10,000 thermometer readings being turned into a "planetary mean temperature". Nor 1000. Nor even 100. Heck, not even 3 temperatures enter the calculation. You get 2 temperatures. Then everything after that is averages of averages of averages of averages... (and no, that is not hyperbole, that is what NCDC and NASA /GIStemp actually do.) And with NO statement of error band, only implicit precision.

So as soon as we do that first daily average of water glasses, we have no idea if we actually had empty glasses or full glasses or even 1/2 full. The correct answer is 1 glass plus or minus 1 glass. And averaging 30 of those AVERAGES together does not improve my accuracy of how much water I have because I lost the use of the law of large numbers for the WATER at that first average. I can state the average of the averages to a very great precision and I still know nothing more about how much water I have. And just to try turning this horse into horseburger one more time: IFF I were to take those 60 glasses of water and average the raw fullness data for 60 data points, yes, I could get better than +/- 1/2 glass error band on THAT average. But that is NOT what NOAA nor GIStemp does. That would be averaging data from "a THING" and not "averaging an average".

Oh, and say hi to Nick for me... -E.M.Smith ]

48. Joshua W. Burton says: