I ask “Where’s the Beef?” and folks offer Holy Hypothetical Cows
Whenever I’ve raised the issue of precision and accuracy drift in GIStemp, the discussion has ended up with folks offering all sorts of reasons why hypothetically you can get a gazillion bits of precision out of a large average of a bazillion things. Then I point out that we have only, at most, 62 values going into the monthly mean (and that done in 2 steps, with opportunities for error and accuracy drift). And that then those values are used for all sorts of other calculations (homogenizing, UHI “correction”, weighting, all sorts of things) before they ever approach the point where they are finally turned into “anomalies”. Even then the method used does not always compare a station with itself. It is more a “basket of oranges” to a “basket of apples”. (And some times there are as few as ONE station forming the “anomaly” for a given GRID box…)
Still, the Hypothetical Cow gets trotted out on stage each time the issue is raised. A Hypothetical Cow, we are told, has near infinite accuracy and precision due to the central limit theorem and the law of large numbers (which, in hypothetical land, can even be applied to small groups of real numbers…)
But this article:
has a discussion of a survey of ‘scientific programming’. one of my favorite bits?
There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies.
So here we have someone specifically studying the issues. And his result?
What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.
Gee. Someone else who measures real cattle and asks “Where’s the Beef?”…
And this is why I don’t trust anything less than Whole Degrees out of GIStemp. Until there is a full end to end QA suite published and both the code and the QA suite have been run, along with a benchmark measuring the error, precision, and accuracy of the code.
Until then it is simply a “hope” that it does the right thing. And hope is not a strategy… Nor is it a QA suite…
The original paper can be downloaded from this location:
The synopsis is:
Describes two major experiments, one static and dynamic and draws conclusions about the proliferation of errors in scientific software. Basic conclusions are that software accuracy is greatly undermined by software errors even when we think it is fine and most packages are full of statically detectable inconsistencies in the use of the programming language. The populations described here were written in Fortran and C.
Perhaps the good doctor could be talked into a study of GIStemp… For a real academic, there is likely to be grant money floating about.
No Quibbling over Central Limit Theorem or Anomalies
Those topics have been beaten to death and I’m tired of folks trying to force feed me Hypothetical Cowburgers and asserting that I just must not ‘get it’. Folks: I “get it” just fine. But I also know what computers DO to numbers. And it isn’t pretty. As that article above points out. And I also know that what GIStemp does is far removed from what Hypothetical Cow Code does.
So please keep discussion on topics involving Real Beef, and not Hypothetical Cows: The GIStemp code. The actual precision changes. Examples of real issues from real programs. How to construct a QA test or benchmark for GIStemp. etc. NOT hypotheticals…
Folks who insist on heading off to theoretical discussions of why the central limit theorem and /or hypothetical anomaly processing are perfect will find they will be snipped for not following direction well.