No, I’ve not discovered any particular new fraud in the temperature record.

What I’m doing is the first step: I’m stating a potential METHOD to detect fudging of data. (Or systematic skew of the data via unintentional transformative processes).

The idea depends on Benford’s Law which states that the frequency of the first digit of many data sets will have a log distribution so that more will be a 1 and decreasing down to very few being a 9.

There are limits to Benford’s Law. One being that the data are more likely to obey it when they span a few orders of magnitude. (If you are measuring a property with a narrow range, like the average melting point of tin samples, you would not expect a wide distribution of digits…)

Benford’s law can only be applied to data that are distributed across multiple orders of magnitude. For instance, one might expect that Benford’s law would apply to a list of numbers representing the populations of UK villages beginning with ‘A’, or representing the values of small insurance claims. But if a “village” is a settlement with population between 300 and 999, or a “small insurance claim” is a claim between $50 and $100, then Benford’s law will not apply.

This means that using F, where temperatures span 3 orders of magnitude, is just marginally usable while C with only 2 orders of magnitude would “have issues”.

As the records are stored in 1/10ths, the overall distribution is actually over 4 orders of magnitude for F and 3 for C, but frankly I’d rather have the extra digit of margin. I suppose one could also say that negative values add some effective increase in the effective range too… but that’s a bit speculative at this point.

Now, this raises a bit of a paradox, as there is the small matter of Benford’s Law being scale invariant: and here I am picking a scale based on that having a variation in the impact…

The law can alternatively be explained by the fact that, if it is indeed true that the first digits have a particular distribution, it must be independent of the measuring units used (otherwise the law would be an effect of the units, not the data). This means that if one converts from feet to yards (multiplication by a constant), for example, the distribution must be unchanged — it is scale invariant, and the only continuous distribution that fits this is one whose logarithm is uniformly distributed.

For example, the first (non-zero) digit of the lengths or distances of objects should have the same distribution whether the unit of measurement is feet, yards, or anything else. But there are three feet in a yard, so the probability that the first digit of a length in yards is 1 must be the same as the probability that the first digit of a length in feet is 3, 4, or 5. Applying this to all possible measurement scales gives a logarithmic distribution, and combined with the fact that log10(1) = 0 and log10(10) = 1 gives Benford’s law. That is, if there is a distribution of first digits, it must apply to a set of data regardless of what measuring units are used, and the only distribution of first digits that fits that is the Benford Law.

I’m comfortable with setting aside that conundrum for now, and simply accepting that the leading digit in C may not obey Benford’s Law as well due to us already knowing their are not a lot of places on the planet with a 50, 60, 70, 80, or 90 C range. That is, the physical upper bound on temperatures will bias the sample. However, it would be worth doing the test in C as well, as Benford’s Law might still apply (given the likely increase in values in the 1x.x and 1.x ranges) In essence, if the C values also obey Benford’s Law it would be great confirmation, but if they don’t, that is not likely to be confirmation of failure so much as a flag to more closely study just what is going on with the data distribution and does that low number of orders of magnitude have an impact.

### Doing It

It ought to be pretty easy to do the test. Just take the data and count the number of instances of the first digit of each data item being a 1, 2, 3, etc. and plot. It ought to be a reasonable approximation of the Benford’s Law distribution – unless the data are cooked (or I’ve managed to screw up at the get go by failure to observe some limitation in the data distributions of temperature data items that makes them unsuited to a Benford’s Law test).

Basically, it ought to give a chart like:

### Not Definitive

Unfortunately, this test is unlikely to be definitive in either direction. Failure can simply mean that temperature data have a natural distribution that does not fit the law (though with 4 digits of F I’m having trouble thinking of how…) while a successful pass of the test may just mean that the Data Diddler was very clever.

Forensics are often like that. You get “indications of reason to suspect” more often than you get “Finger Of God’s Own Truth”. One finger print can just mean the person was there at an unknown time in the past (and it was OK); you need to correlate that with more data (when room was cleaned, were they authorized to be in the space at all, ever?) before it becomes more than just a flag of suspicion.

So what good is it then?

Pretty simple. If the data conform, it lends credence that there was no ham handed deliberate Data Diddling (and you can focus on more abstract and difficult searches, probably computer driven diddling that can keep the statistics right).

If the test fails, it says you need another bit of exploration to show probable fraud **and** it tells you what that bit of exploration would be.

Basically, show that unbiased temperature data do obey the law; then you have a smoking gun. That can be simply done with some old raw data of known quality. IFF a known unprocessed broad sample obeys Benford’s Law, but the post processed GHCN v.1 or GHCN v.2 or GHCN v.3 do not (or even more deliciously, if V.1 does and V.2 DOES NOT ;-) well, then it’s hanging time in the court of Data Diddling…

### Prior Art?

Is this a brand new idea, or is there some reason to think it’s an OK use of Benford’s Law?

Well, it’s pretty well accepted as a forensic tool, and it’s even accepted in court. I think that means it has some validity (though it would need a better statistician them I am to testify).

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s law ought to show up any anomalous results. Following this idea, Mark Nigrini showed that Benford’s law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud. In the United States, evidence based on Benford’s law is legally admissible in criminal cases at the federal, state, and local levels.

This use, though, depends on the human desire for an even distribution of made up numbers. A consistent algorithmic variation (such as increasing by 0.5 C across the board) is less likely to cause a broken distribution. A bias such as “lift some 1.x to 2.x” temps would be shown, even if done algorithmically, as it would shift the leading digit count toward a 2 and away from a 1; so in that kind of case the more ‘subtle’ adjustments can yield the most indication of biases.

Basically, it’s a valid method, but it doesn’t catch everything.

Also, I’m not the first person to think of this. A quick web search showed at least one other person has thought of it, but I’ve not found any evidence of it having been done (yet…)

noaaprogrammer says:

June 28, 2009 at 5:50 pmPaulH wrote: “All of this is looking more and more Enron-esque with each passing day. How much longer can they cook the books before it all comes crashing down?”

Benford’s Law on the distribution of leading digits is sometimes used to catch those who “cook” financial records. However with data such as temperature that has a restricted range, can Benford’s Law be adjusted to take this into account?

See “Applications and Limitations” section in:

http://en.wikipedia.org/wiki/Benford’s_law

There is also a distribution on the second leading digits which may not be as sensitive to data with restricted ranges.

(See “Generalization to digits beyond the first” section in the above.)

### Dig Here!

I’ll likely put another 10 minutes or so into more creative web searches for “prior art” prior to doing the actual data test myself, but that is another area where folks could “Dig Here!” as I’m up for tea and breakfast before I do more on this line.

Someone with decent spread sheet skills could do this fairly easily just using the common spread sheet applications. I’ve got the data in UNIX files, so would likely use a more long hand FORTRAN approach (as I have the code to read the files already set up). However, I’m trying to catch up a lot of other things right now, so I’m also unlikely to get around to it before Christmas.

This puts us all in a bit of a Race Condition…

A race I would be happy to lose…

So if anyone else wants to run a Benford’s Law test on the data and report back here, feel free and by all means take the credit. I’m happy to just have been a ‘useful irritant’ by presenting the idea.

If not, well, I’ll get around to it eventually…

I am also pretty sure that a similar test for “proper” distribution of final digits could be done (though not Benford’s Law). I’ve seen some indications of non-random distribution artifacts (like that temperature series in the Paducah posting where the last digit kept hopping back and forth from a .4 to a .9 repeatedly). I believe the trailing digits ought to have an even distribution with no nodal points. So the whole data set, if looked at with both tools, could be tested for leading digit, second digit, and trailing digit non-normal distributions.

Yeah, kind of dull work… but not all that hard to do and the results can sometimes be rather interesting…

E.M., Forgive my ignorance, I’m having some difficulty understanding what exactly you are proposing to feed to the mighty Benford Extraction, Investigation and Unraveling Engine?

i.e. ‘Raw’, dailies, monthlies, adjusted, unadjusted? I suspect many of those would fail almost automatically?

One thing that Steven McIntyre discovered when researching large anomalies in temperature reports was that often the data for a month is simply a re-report of the previous month’s data. This shows up glaringly in spring and fall when the month to month averages change greatly but isn’t so obvious between, say, July/August or June/July or Jan/Feb or Dec/Jan.

Going back in the record they found several such instances, mostly from Finland and Russia but I believe there were a few elsewhere. The search was by no means exhaustive, either, just hitting the low hanging fruit.

Another issue Anthony Watts discovered was a reversed sign in many METAR reports. Well, not exactly a reversed sign in the report but a bug in the software that interprets the reports. If you report a temperature of -10F it will be reported as 10F because you aren’t supposed to use a “-” in METAR reports and the illegal character will be ignored. You are supposed to report it as M10F (or Minus 10F), if I remember correctly. No exhaustive search was made to correct these either, again, a little of the low hanging fruit was picked.

Here is a nice article concerning Benford’s Law: http://members.fortunecity.com/templarser/one.html

Briefly, Benford’s Law is a result of the real world measuring processes. Because most physical attributes vary along a Gaussian distribution, a random grab bag consisting of measurements of those attributes will have a certain correlation among themselves. Because nature does not care whether you use feet, furlongs or Palaeolithic what-evers as your unit, the correlation needs to be scale invariant. Scale invariance implies a logarithmic distribution of values.

Two points: is it reasonable to expect that honest temperature (or precipitation or whatever) measurements fit a Gaussian distribution? I seem to remember hearing that a lot of weather phenomena have a non-Gaussian fat-tailed distribution. That might complicate things and possibly make Benford’s Law inapplicable. I am not bright enough to know for sure, but the non-Gaussian thingy might need to be considered. Heck, it is probably still worth trying though; who knows what might turn up?

Secondly, a bit of philosophy. Scientists and philosophers argue about the existence of an objective reality — and by objective, I can only think that they mean “the same reality for all observers.” Not that we all experience the same sensations, perceptions, etc., but that we are all at least measuring the same world in some sense, a world that exists independently of the observer. Such an independent physical world (by definition) would have to have universal qualities; it would need to be scale invariant. I find it very interesting that our measured world seems to fit that need. The quantum experts tell us that there is no world out there separate from our observations. I am not so sure…

@ George

I have heard that a good place to look for errors in the record is in the Soviet measurements. Apparently, many of the cities in the northern Siberian region of the USSR were allocated coal and oil based on their climatic needs, i.e., on “how cold does it get there each winter?” The colder the city, the more coal and oil it got….so what do you think happened to their temperature measurements? Yup…outright fraudulently low records. Once the Soviets fell in the early 1990s, the reported temperatures became untied from resource allocations and there was a remarkable, uh, “warming” in the reports.

Funny how these things work.

An intriguing aspect of the temperature record, and one that shows “evolution over time,” is the drop in continental US temperature from the 1940s to the 1970s. In 1975, Newsweek reported a global drop of 2.74°F (1.75°C). Later, this global drop was all but eliminated, reducing by an order of magnitude to 0.1 or 0.2°C instead. The opportunity for alteration of those records seemed large.

But in the recent BEST temperature data, I see that the US drop is suddenly larger than it was. And in this BEST document, the NOAA uncertainty measurement changes abruptly during this same timeframe in their Figure 1:

http://berkeleyearth.org/Resources/Berkeley_Earth_Averaging_Process

I don’t know the significance of it (in both senses of that word).

===|===========/ Keith DeHavelle

Might want to have a look at this:

http://www.appinsys.com/globalwarming/RS_Greenland.htm

Looks like there’s really nothing going on in Greenland worth paying attention to.

@Chuckles:

I would take the GHCN (any version at first, but eventually V1, V2 vs each other and vs V3) and the USHCN by version as well.

No ‘engine’ needed. Just a ‘population count’ of the first digits.

“Expecting at it” doesn’t do much, so I’d not expect at it. I’d just add up the population counts and see if anything interested showed up. If it does, then I’d compare to a ‘selected raw set’ from {whereever you can get it} as a cross check on the method / data suitability.

@Jason Calley:

The quantum experts are probably wrong ;-)

@Keith DeHavelle:

Somewhere or other I have a saved image of an old magazine article from the ’70s with that dip in the graph. Then, over several iterations of the data, the dip goes away…. Supposedly all from the same raw data set.

IMHO, we’ve got ‘adjustment jiggery pokery’… not ‘warming’

@George:

Oh No, Say It Isn’t So!!! I was so looking forward to staring at a large blob of ice for the next 30 years ;-)

@All:

In pondering this a bit more, I think I’d take the data, converted to F, and add to it a positive offset such that the lowest negative value became 0. THEN do the population count on the leading digits.

Why? The ‘test’ depends on the fact that if things are more or less smoothly distributed, you get a log distribution of the leading digits as you ‘count up’. As I don’t know that would hold in ‘counting down’ to negative values, simply putting them all as positives would make sure the ‘counting up’ and ‘counting down’ did not in some way cancel out. Scale Invariance would assure we can do the offset and not muck up the result. That, then, would give a range from about -60 F to 120 F, or, offset, 0 to 180. If measured in 1/10 degree increments, that’s 0 to 1800, or a fairly clean 4 orders of magnitude. That ought to be enough to show the effect.

As temperature values vary by more than 10 F pretty much everywhere, there ought not to be a bias on the leading digit from the range max being constrained to 1800. (that is, there ought to be lots of places with 800 200 etc values and even the ones that DO go to 1700 or 1800 have lower values at other times).

At any rate, that’s where I’ve gotten to so far…

Regarding Benford’s Law, I remember to have read somewhere that he started out when he observed that the first pages of a heavily used logarithmic table book had been more worn out than the rest. That the people who used it then all worked with gaussian distributed data appears to be an additional assumption to me.

As an aside, the R-package “latticist” provides a nice graphical interface to a collection of tests as clickable options, among them the Benford Test Laticist is able to read a bunch of input data formats (CSV…), and also allows for data transformations before applying any test.

@ Hugo “I remember to have read somewhere that he started out when he observed that the first pages of a heavily used logarithmic table book had been more worn out than the rest. ”

Almost! That was actually Simon Newcomb, the 19th Century Astronomer, who noticed. He wrote the phenomenon up, but it was mostly forgotten until Benford came up with the same concept. Somewhere around the house I have an old astronomy book by Newcomb.

The quantum experts tell us that there is no world out there separate from our observations. I am not so sure…The world exists independently of the observer. Unfortunately, the measurements don’t. Consequently, we all feel the elephant. Unfortunately, some of us become rather dogmatic about what every one else should feel. “Feels like a snake to me.” “Nonsense, it feels liked a wall and you g-d well know it does.”

@ Duster “The world exists independently of the observer. Unfortunately, the measurements don’t. ”

Personally, I agree with you. On the other hand there have been a bunch of high power scientists who did not. IIRC, it was John von Neumann who most strongly championed the “if-there’s-no-one-watching-then-the-world-is-not-really-there” interpretation. That is a heck of a pill to choke down though. I certainly do not try that stunt when I meet another car at the round-about. “OK, Jason, just close your eyes and the other car will be gone!”

Caveat! In all matters of quantum mechanics I reserve the right to say “Did I say that?! What was I thinking?” I could very well be mistaken on this subject.

@Jason Calley:

Um, in all matters of Quantum Mechanics can you not say: “I said that in an alternate reality, this phase space is a parallel instantiation and things may be different”. Or, more short: “It’s possible I said that and it was right, or that I didn’t say it and was right, or both.” Until an observer measures the photon, it went through both slits, then it didn’t ;-)

Biggest issue for me is the the Pi electron orbital. It has a positive probability of being at any location on each side of the nucleus, but a ZERO AT the center plane of the nucleus… So how does the electron cross the road and NEVER BE IN THE MIDDLE? Um, it’s, er, state equation says it does…

Personally, I think we’re missing a couple of dimensions and just ‘making stuff up’ to try and cover that gap.

Did anyone see a cat around here?

No, but I heard one meow ;-)

@ E.M.

This guy is convinced stuff is being made up:

http://milesmathis.com/updates.html

@Pete: Thanks, I’ll take a look.