Of Hypothetical Cows and Real Program Accuracy

Picture of an Indian Cow with attendants being groomed

Holy Hypothetical Cow!

Original Image

I ask “Where’s the Beef?” and folks offer Holy Hypothetical Cows

Whenever I’ve raised the issue of precision and accuracy drift in GIStemp, the discussion has ended up with folks offering all sorts of reasons why hypothetically you can get a gazillion bits of precision out of a large average of a bazillion things. Then I point out that we have only, at most, 62 values going into the monthly mean (and that done in 2 steps, with opportunities for error and accuracy drift). And that then those values are used for all sorts of other calculations (homogenizing, UHI “correction”, weighting, all sorts of things) before they ever approach the point where they are finally turned into “anomalies”. Even then the method used does not always compare a station with itself. It is more a “basket of oranges” to a “basket of apples”. (And some times there are as few as ONE station forming the “anomaly” for a given GRID box…)

Still, the Hypothetical Cow gets trotted out on stage each time the issue is raised. A Hypothetical Cow, we are told, has near infinite accuracy and precision due to the central limit theorem and the law of large numbers (which, in hypothetical land, can even be applied to small groups of real numbers…)

But this article:

http://www.guardian.co.uk/technology/2010/feb/05/science-climate-emails-code-release

has a discussion of a survey of ‘scientific programming’. one of my favorite bits?

There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies.

So here we have someone specifically studying the issues. And his result?

What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.

Gee. Someone else who measures real cattle and asks “Where’s the Beef?”…

And this is why I don’t trust anything less than Whole Degrees out of GIStemp. Until there is a full end to end QA suite published and both the code and the QA suite have been run, along with a benchmark measuring the error, precision, and accuracy of the code.

Until then it is simply a “hope” that it does the right thing. And hope is not a strategy… Nor is it a QA suite…

The original paper can be downloaded from this location:

http://www.leshatton.org/IEEE_CSE_297.html

The synopsis is:

Describes two major experiments, one static and dynamic and draws conclusions about the proliferation of errors in scientific software. Basic conclusions are that software accuracy is greatly undermined by software errors even when we think it is fine and most packages are full of statically detectable inconsistencies in the use of the programming language. The populations described here were written in Fortran and C.

Perhaps the good doctor could be talked into a study of GIStemp… For a real academic, there is likely to be grant money floating about.

No Quibbling over Central Limit Theorem or Anomalies

Those topics have been beaten to death and I’m tired of folks trying to force feed me Hypothetical Cowburgers and asserting that I just must not ‘get it’. Folks: I “get it” just fine. But I also know what computers DO to numbers. And it isn’t pretty. As that article above points out. And I also know that what GIStemp does is far removed from what Hypothetical Cow Code does.

So please keep discussion on topics involving Real Beef, and not Hypothetical Cows: The GIStemp code. The actual precision changes. Examples of real issues from real programs. How to construct a QA test or benchmark for GIStemp. etc. NOT hypotheticals…

Folks who insist on heading off to theoretical discussions of why the central limit theorem and /or hypothetical anomaly processing are perfect will find they will be snipped for not following direction well.

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, Favorites and tagged , , . Bookmark the permalink.

29 Responses to Of Hypothetical Cows and Real Program Accuracy

  1. Tony Hansen says:

    If I ‘herd’ you right , the people who try to ‘steer’ you towards ‘cows’ are full of ‘bull’.

  2. Phil says:

    Unfortunately, the paper is missing figures 4, 5 and 6. I found an old PowerPoint presentation with what I believe to be the missing figures and done by the same author but I lost the link due to computer problems. The .ppt file was such an old version (PowerPoint 4.0?) that I had trouble opening it, but I was finally able to port it into a .pdf. Email me and I can send you a copy.

    I’m glad you posted this as it vindicates your points about accuracy and precision in spades. After reading the paper, however, I think you need to point out that the situation with respect to GIStemp and HADCruT are even worse. The study limited itself only to software that had been in use commercially and that was considered “debugged” and that had gone through strict QA and otherwise considered “mature” – a far cry from GIStemp and HADCruT. Here is another money quote:

    Taken with other evidence, these two experiments suggest that the results of scientific calculations involving significant amounts of software should be treated with the same measure of disbelief as an unconfirmed physical experiment.(emphasis added)

    The paper mentions an analysis tool (QA Fortran) that might be available to test both GIStemp and HADCruT (or at least the Fortran parts).

    Another important point that the author makes is in Table 3, which shows the deterioration in agreement dropping from 6 significant figures for floating-point math to 1 significant digit at the end of the process for seismic data processing software currently in use. He says that:

    (u)fortunately,modern seismic data processing interpretation by geologists relies on 2-3 significant figure accuracy … (emphasis added)

    The future of the world economy should not depend on such imprecise calculations.

  3. Pingback: Ooops – errors in land temperature index? « TWAWKI

  4. Mark P says:

    I wonder if the people so in awe of the climate simulations risk their actual money with the financial equivalents.

    A quick look at the fate of Long Term Capital Management should be a warning to us all. Very brainy people cannot necessarily predict even the near future.

  5. E.M.Smith says:

    @Tony: Yes. And they need to ‘moooove” along ;-)

  6. j ferguson says:

    What would you expect from the udder guys?

  7. j ferguson says:

    Sorry, E.M. and comrades but there is a lot to work with here.

    For example, E.M., so you feel your ox has been Gored?

  8. David Shipley says:

    @ Mark Good point.
    I have been trying for some time to explain to some of the “settled science” crew that their world view depends on data and model structures which both have a lot in common with the advanced mathematical models built like inverted pyramids on flaky base data to “manage risk” in the investment banks. But the stock response is “oh no, unlike your dumb banker friends, the GCMs are top quality, they are quite simple really and they predict everything accurately.” Actually, all they can they predict is an inaccurate version of the past, by twiddling input variables to force agreement, just as the VaR and derivative pricing models did, and ignoring internally generated error margins, just like the bankers.
    Never mind, what’s a trillion of other people’s money here or there, especially if you can get your snout in the trough. Just like the bankers…..

  9. Fred2 says:

    There may be a role for lawyers. Has all the source code for the models been publicly archived? It sounds like FOI is called for as public money funds this stuff.

  10. Tony Hansen says:

    E.M.
    I can’t help but wonder if they have some steak in the outcome.
    Are they already milking the current system for all it is worth?
    Soon I think they will find themselves on the horns of a dilemma – either stay on the gravy train with their rump party (to the bitter end) or buck the system and change teams.
    But now I should perhaps butt out, take your advice and moooove along.

  11. P.G. Sharrow says:

    As I have watched this unfold, I at first thought they were trying to keep the use of their code private. Then I later thought they were ashamed of the poorly done code and wanted to keep anyone from discovering the level of their competence. Now it appears that the code and science is deliberately rigged for penurious motives.
    I have been creating computer data bases and spread sheets since 1986 for my engineering and business needs and I learned along a long time ago that a computer program can be tweaked to tell any result you want to see. GIGO. Very good for impressing the rubes who think the “computer” is all powerful. It is just a dumb machine that executes the program calculations to give an “answer”.
    It does exactly what you tell it to do. No more and no less.

    To error is human, to really screw up you need a computer program.

    Although this maybe the first time that a Religion has used computers to “prove” their theology, this certainly is not real science.
    Climategate reminds me of the “Dilbert” cartoon strip where the output is determined first for the needs of management and reality is secondary..

  12. M. Simon says:

    Dang. I had something really short and sweet to say about hypothetical cows.

    Forensically it is good to work backwards.

    For real advances – forward.

    1. List all the known variables
    2. Standard values – i.e. pi, Avogadro’s Number, Planck’s Constant
    3. Measurement systems – interconversion must be lossless to 64 bits or better.
    4. A parallel program is carried on to compute error propagation.
    5. Functions to compute a variable this should be either mathematical or table driven with interpolation.

  13. M. Simon says:

    It may in fact be a good thing to devise a structure that carries the error forward with each calculation.

  14. M. Simon says:

    It is time to stop trying to fix a disaster. The uselessness is known. Bury it.

    Time to start designing. Five days of planning. Two days of work.

  15. j ferguson says:

    P.G. Sharrow,

    “Although this maybe the first time that a Religion has used computers to “prove” their theology, this certainly is not real science.”

    What an outstanding insight. I wonder if you could work it backwards and contrive a religion from good code.

  16. Chuckles says:

    @j ferguson,

    Well when the *$#%code finally works, it’s usually quite a religious experience?

  17. AJStrata says:

    I have been arguing for weeks that 99.99+% of the global temperature index is pure speculation and not actual measurements. I looked at temperatures on any given day within 100 km’s of where I live in Northern VA and determined the national variation in 100 km (1 std dev) is on the order of 2°. If one tries to extrapolate this out to a 500 km grid the accuracy (or certainty) decays to many more degrees. If you try to extend this inaccurate value beyond the 500 km grid to an adjacent 500 km grid with no measurements the accuracy (or certainty) decays even further.

    Temperatures do not remain stable or accurate beyond a few tens of km, and only in very homogenous regions with little change in altitude, urbanization and amount of water in the region.

    I agree whole heartedly the Achilles’ Heal of AGW is proving there is any way to detect a 0.8° increase in temperature from a mass of wild guesses built upon a paltry set of actual measurements which represent less than 0.01% of the world’s surface.

  18. Chuckles says:

    AJ,

    And my pet subject – where many of the paltry set of measurements are on instruments of +- 0.5 deg C accuracy, rounded to the nearest deg. F. Oh and take max and min readings like the above and average them…..

    I think we have to audit and evaluate all the way from the capture of the raw data, right through to a grid or global temperature or temperatures that we can agree are representative of something.

    Heck, the current system means we could get anyone worldwide with a home weather station to add their daily readings, altitude and lat/long to a website. Heck, we’d have a better network than anyone else. Chiefio has the software running, and the central limit theorem and law of large numbers make our results untouchable.
    Oh, and we get to decide the adjustments….

    Assume a spherical cow…

    REPLY: I always wondered why hamburger patties were round ;-) -E.M.Smith]

  19. vjones says:

    @Chuckles
    Heck, the current system means we could get anyone worldwide with a home weather station to add their daily readings, altitude and lat/long to a website. Heck, we’d have a better network than anyone else. LOL (and agree!)

    @AJ, saw your analysis when you posted it – very good.

  20. Chuckles says:

    E.M., I left out the ‘in vacuum’ bit

    http://en.wikipedia.org/wiki/Spherical_cow

  21. Jim Masterson says:

    >>
    Chuckles
    And my pet subject – where many of the paltry set of measurements are on instruments of +- 0.5 deg C accuracy, rounded to the nearest deg. F.
    <<

    I’m not sure why someone would read a temperature off of a Celsius scale and round to the nearest Fahrenheit degree, but to each their own. At least you included the units with error ranges. Since I’m somewhat of a “units fanatic,” I notice when units are poorly used or missing altogether.

    The climate community has a lot to learn about error ranges, units, rounding errors, floating-point formats in computers, conversion errors, truncation errors, etc. It’s too bad–with OOP (Object Oriented Programming), classes that can correctly deal with error terms in calculations are easy to create. I’ve yet to see much use of the technique–especially in climate science. Of course, FORTRAN’s not much of an OOP language.

    Jim

  22. Chuckles says:

    Jim, Couldn’t agree with you more. It’s not actually read off in deg C and converted, the unit spec is in deg C., display in deg F.

    I was actually quoting from USHCN document td3200 which is the data documentation for data set 3200. Available here if you REALLY want it:

    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/#quality

    It has the following to say about the MMTS (max-min temperature system), on page 4:

    The accuracy of the maximum-minimum temperature system (MMTS) is +/- 0.5
    degrees C, and the temperature is displayed to the nearest 0.1 degree F. The
    observer records the values to the nearest whole degree F. A Cooperative
    Program Manager calibrates the MMTS sensor annually against a specially-
    maintained reference instrument.

    The relevance is that this is the definitive document that describes one of the main data sources for the GHCN data that becomes the GISS or HadleyCRU data sets.

    And it IS +-0.5 deg C as the base accuracy of the units, displayed and recorded to the nearest deg F, for both the min and the max temps. These are used to create a mean/average temp for the day, presumably also reported to the nearest degree F.

  23. Pingback: Errors in Measurement « Another View on Climate

  24. Ian Beale says:

    j ferguson
    P.G. Sharrow,

    “Although this maybe the first time that a Religion has used computers to “prove” their theology, this certainly is not real science.”

    “What an outstanding insight. I wonder if you could work it backwards and contrive a religion from good code.”

    A long time ago (ca 1970’s) there was a speculation in the New Scientist then Ariadne’s Column about whether one could set up a programmed religion. I don’t have the reference still, but recall that the suggestion included

    Edicts based on criteria diametrically opposed to those in current behaviour

    Bonuses based on the “premium bond principle” – UK readers might expand, but akin to lottery bonuses

    And concluded with the question around how many would believe, even if the things they were told included the name of the software author

  25. Vaughn Arman says:

    A serious tester will have a systematic attack to testing that they can apply to any established scenario without hesitation. The one I got numerous moons previous was to document the trials for drawing out a can from a lemonade vending machine. I did the whole prerequisites, analyzed the testing environs, assured the machine was fit for testing and begun the “tests”. Put in too little money, have money given back, put in correct money, have money given back etc. Then when I did finally push vend, the can got stuck instead of descending to the tray. So I employed the “workaround” of shaking the machine with my shoulder. My report concluded that the machine was periodically unfit for purpose unless a physio could also be set up in the building to contend with the manifold painful shoulder joints that would result from its execution.

  26. C James says:

    Chuckles….I can’t find the quoted comments in the link you provided. I have been looking for this info and wonder if you have another reference link.

  27. Tony Hansen says:

    E.M.,
    If a cow is hypothetical, what is a password?

    REPLY [ “A work in progress” -E.M.Smith ]

  28. Tony Hansen says:

    Thanks EM,
    May your progress exceed your expectations.

  29. Chuckles says:

    C James,

    Sorry about the delay in replying, you have to click on the ‘DSI-3200’ link on that page. This is a spec for the data in a pdf file.

Comments are closed.