QA or Tossing Data – You Decide

Well, Is It An Improvement Or Not?

I found this paper:

that gives an overview of some of the approaches used in the “QA Process” at NCDC. I’ve read it through once (but not followed all the links to the other papers it references).

I find reason to think that the methods detailed in this paper are the “Smoking Gun” for the extraordinary compression of the “range” of the monthly anomalies that we see in 1990. There are some alternative “Duplicate Number” series that begin in the late 1980’s with dates that match those of the papers listed in this paper (1987, 1989 etc.).

The process is claimed to have been tested by how many “seeded errors” it found, but I don’t see much mention of how many valid values were rejected nor a discussion of any method to assure the data are not biased by this procedure. Maybe I missed it or maybe it is in the links I’ve not yet followed. ( It was a quick first read).

But frankly, what worries me the most about this is the, in my opinion, rather extreme and cavalier changes made to data that are then presented as ‘nearly raw with a bit of QA’. Things like replacing values with computed values based on nearby stations. Not exactly my idea of a simple QA process..

A couple of quotes:

Recently, the use of multiple stations in quality assurance procedures has proven to provide valuable information for quality control (QC) compared with the single-station checking. Spatial tests compare a station’s data against the data from neighboring stations (Wade 1987; Gandin 1988; Eischeid et al. 1995; Hubbard 2001a). They involve the use of neighboring stations to make an estimate of the measurement at the station of interest. This estimate can be formed by weighting according to distance separating the locations (Guttman et al. 1988; Wade 1987), or through other statistical approaches [e.g., multiple regression (Eischeid et al. 1995) and weighting of linear regressions (Hubbard et al. 2005)].

Nothing like requiring that a station never have an ‘extreme event’ to assure that it never has an extreme event…. So what happens as there are ever fewer ‘reference stations’ for comparison and as more of them are Airports? What happens to the “nearby” rural cold station when it must be in compliance with the dozen Airports or get killed?

I’ve bolded a bit here:

The SRT approach has been found in a previous study (Hubbard and You 2005) to be more accurate than the inverse distance weighting (IDW) approach for the maximum air temperature (Tmax) and the minimum air temperature (Tmin). It was found that the RMSE was smaller for SRT estimates than for IDW estimates in all areas including the coastal and mountainous regions. Both the spatial regression and inverse distance methods were found to perform relatively poorer when the weather stations are sparsely distributed as compared to areas with higher station densities. The success of the spatial regression approach is in part due to its ability to implicitly resolve the systematic differences caused by temperature lapse rate with elevation; these differences are not accounted for in the IDW method.

So the more stations you drop the worse your performance becomes? AND they know this? AND they are doing it anyway?

2) Step change (SC). This is a check to see whether or not the change in consecutive values of the variable x falls within the climatologically expected lower and upper limits on rate of change (ROC) for the month in question. In this case the step is defined as the difference di between values on day i and i − 1; that is, di = xi − xi−1. The step change test is performed by using this definition of d, calculating the associated mean and the variance, and using Eq. (2) with d substituted for x. Again fsc takes a value of 3.0.

“Climatologically expected” by whom? This looks like it’s just imposing statistical bounds on what the real data can report based on what they deem acceptable realities… I was in Dallas once when a 50 F drop happened from a cold front moving in. Step Change does happen in weather. In my experience, more to the cold side than the hot side. Will this toss out more cold step changes than hot?

4) Maximum and minimum air temperature mixed up (Mixup). This is a check to see whether the maximum and minimum air temperatures were interchanged by the observer. The record will not pass the test when the maximum air temperature of the current day is lower than the minimum air temperature of this day, previous day, or next day.
Similarly, the record will also not pass the test when the minimum air temperature of current day is higher than the maximum air temperature of this day, previous day, or next day.

So we can never transition from hot to cold in one day. Nor even two. Need to tell that to everyone hit by a Canada Express… Not stated is what happens to that QA process when that day is “tossed out”. If it was invalid, what is the NEXT day compared with? I’d like to see some field work that shows this is valid for all places on the planet. Again we see no recognition of the fact that places can be made cold faster than they can be made warm (Think snowfall vs clouds clearing. Delivering a few tons of ice per hectare will cool you down “right quick”. A sunny winter day warms you not so much… )

5) Spatial regression test (Hubbard et al. 2005). This test is a quality control approach that checks whether the variable falls inside the confidence interval formed from surrounding station data. Linear regression estimates are calculated for all pairs {x, yi}, where x is the station whose data is being quality checked, yi are the data for the i surrounding stations (i = 1, n), a and b are the regression coefficients, and the data record spans N days. For an observed yi, n estimates of x are calculated. A weighted estimate x′ of these n estimates of x is obtained by utilizing the standard error (root-mean-square error) sei of the n regression estimates:

And it goes on…

Well, nothing like a little regression testing to make sure everything fits inside your preconceived notions of what’s allowable. But who does the pre-conception? Based on? Is nature so predictable that it never delivers surprises? Are surprises to be dropped on the floor because nature is not allowed them? Is this The Procrustean Bed of Weather? If so, pray thee Gods, make me Theseus…

The NCDC Approach

As described in the paper: (I’ve bolded bits)

b. NCDC approach
The NCDC quality assessment is based on accepting all observed data that are plausible. There are five steps in the evaluation of temperature data. Because of the volume of data that are processed as well as requirements to provide quality assessed digital data to customers in near–real time, a goal of the approach is to automate as much evaluation as possible.

1) Pre-edit. This step checks the input data records for format and coding errors. Improper station identifiers, invalid characters, duplications, values that are not in a valid range, unexpected data, and other similar problems are identified and corrected if possible. If it is not possible to correct these errors, then a datum is labeled as missing for follow-on processing.

2) Climate division consistency. Departures of a station’s data from the monthly average of the data are calculated for all stations within a climatic division [see Guttman and Quayle (1996) for a description and history of the 344 climatic divisions in the contiguous United States]. The average departure for each day is then calculated. A datum is flagged for further review if the departure for a given station and day differs from the divisional average for the day by more than ±10°F. For a given day, temperature means and variances are estimated from all the divisional data that have not been flagged for further review. Any flagged data that exceed ±3 standard deviations from the mean for the day are then flagged for replacement. Replacement values are calculated from data for up to six nearby stations by the following procedure.

OK so we use only the accepted data to determine what is accepted. This opens the door for a recursive accumulation of error. And it’s automated as much as possible. How much?

We use a fixed size (10F) and ignore that cold anomalies are more common than hot (Hot air rises so we convectively limit. Cold air does not rise, it accumulates. Further, as we drop stations with extreme values – those mountains and high latitudes – our “average” narrows so what we will accept narrows. Station dropout matters here.) Then we just make up replacements…

For all nonflagged data, for a station, compute Z scores to standardize the data with zero mean and unit standard deviation.

For all combinations of the nearby stations, compute the average daily Z score.

For each combination of surrounding stations, multiply the average daily Z score by the standard deviation of the nonflagged data for the station for which a daily value is to be estimated (replacement station).

For each combination, subtract the estimated departures from observed, nonflagged departures for the replacement station.

For each combination, compute the error variances of the data obtained in 4.

For the combination with the smallest error variance obtained in 5, for the day being estimated at the replacement station, add the estimated departure to the replacement station mean calculated from the nonflagged data.

What a complicated stew just to make up a number… and it’s validity rests on it being statistically close to it’s peer group? It must match the consensus or be shot?

Replacement values that differ from the original observation by more than ±15°F may be manually adjusted if a validator believes the flagged values are in error by more than ±8°F.

Or they can just make something up the old fashioned way… Wonder where they keep the hat they pick from? At least, I hope they are pulling numbers out of their hat, considering the alternative…

Validators also compare the divisional data to the top 10 and bottom 10 observed extremes on a statewide basis. This comparison is intended to identify gross keying errors and anomalous extreme values and is performed both on the observed data and on the replacement values.

And as an ever greater percentage of stations are Airports and Urban we will have more “top” extremes to pass data and fewer “bottom” extremes… Station Drops Matter to this process. Drop all your high cold mountains and your high latitude stations and you will have a drastically reduced set of “bottom extremes” to validate against. So California with SFO, LA, San Diego and Santa Maria will have how many “observed extremes on a STATEWIDE basis” against which to compare the snow presently falling in the High Sierra Nevada?

3) Consistency. This check ensures that maximum, minimum, and observation time temperatures are internally consistent. Physically impossible relationships, such as the minimum temperature for a day being greater than the maximum temperature for the same day, are flagged as suspect. Often, these errors result from incorrect dates that are assigned to an observation (sometimes called “date shifting”); if possible, the flagged data are corrected.

4) TempVal. This spatial check uses grid fields derived from Automated Surface Observation System (ASOS)/Automated Weather Observing System (AWOS) hourly and daily temperature values as a “ground truth” to quality ensure the Cooperative Network daily temperature data (Angel et al. 2003). Note that the previously described steps are only applied to the cooperative data; this step compares the cooperative data to an independent data network.

Wasn’t there a problem with the ASOS having trouble with internal heat from the electronics and not reporting bottom values well? And were they not found to be held close to buildings by their wiring, too close to be acceptable to the published standards? And are not the bulk of them at Airports with loads of Tarmac and motor vehicles? So all thermometers must conform to the one that ‘had issues’?

Grid fields of departures from monthly averages are derived from the ASOS/AWOS data using a squared distance weighting function to estimate values at the corners of half-degree latitude–longitude boxes. Three grids were produced for each day of data corresponding to midnight, a.m., and p.m. observation times. The ASOS/AWOS data were quality assessed through an independent processing system that is more extensive than the cooperative data processing system. The nature of the checks are similar to those described above (sections 3a–c), but observations at both hourly and daily time scales, as well as the observation of more meteorological elements, lead to many more checks. Even though the data have been extensively assessed and processed, each grid used in TempVal is examined for suspect data. Every gridpoint value is compared to the average value of surrounding grid points. Suspect grid points (bull’s eyes), along with a list of surrounding ASOS/AWOS stations and their temperature values, are brought to the attention of a validator, and corrections and/or deletions are made as necessary. The grid values half-degree north, south, east, and west of the Cooperative Network site are also calculated. These values are used to determine the gradient (or slope) of the grid at this location.

The data for a cooperative site are compared to the grid estimates at the site. When the difference between a cooperative value and the estimated value is greater than ±(7°F + slope), the cooperative datum is flagged as suspect, and the grid estimate becomes the replacement value for the suspect observation. Note that the constant 7°F is usually much greater than the slope, so that the threshold is approximately a fixed value; the acceptance range for an observed datum is of the order of 16°–20°F. The grid estimate becomes the replacement value for the suspect observation.

We have an average used as the replacement. Averages always have lower range than simple measurements. Is there any metric for how much of the GHCN data set are so constructed?

5) Last look. Validators once again compare the cooperative data to the top 10 and bottom 10 extremes for the state to ensure that replacement values are not anomalous extreme values. Consistency and range checks are also performed again to ensure that the assessment process did not introduce errors.

And we do it all again. No wonder at the end of this everything is just squashed into submission as an ever flattening range. AND the concentration of stations at Airports and use of ASOS as gatekeeper assures that “Everywhere is in conformance with the Airports” No Matter What.

These “daily values” then get used to compute the “monthly mean” that is all we see in the GHCN and USHCN data sets. So how much of the “daily data” is made up or tossed out on the way to the “monthly mean”? We just don’t know.

OK, so how do I get data that has not be so adjusted on it’s way to being sold as UN-adjusted?


About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in AGW Science and Background, Favorites, NCDC - GHCN Issues and tagged , . Bookmark the permalink.

13 Responses to QA or Tossing Data – You Decide

  1. j ferguson says:

    Is the above why the spikes left in the early nineties?

    I wonder why no-one worried that the data after 1990 +/- looked different. Maybe they thought the earlier stuff wasn’t as high quality or reliable.

    Or they never graphed it like you did, so they didn’t see it.

    It is a little like discovering that the air conditioning ducts run through the elevator shafts when you finally draw the design to scale.

    REPLY: [ IMHO, having seen a large number of folks with advanced degrees and high rank in organizations, it is most like a mixture of hubris and vanity. It’s very hard for folks to learn to think that perhaps they might be wrong. This leads to things like the complete and utter lack of a software testing and qualification suite and / or acceptance criteria. And it leads to believing that “your baby” is perfect, even when it’s fat and smelly and makes bad numbers. It is why professional software manufacturers have a dedicated QA and validation department and why it’s always best for cops and the courts to be different bodies.

    So I’d expect the “process” is something like “Sounds good. Math worked. Trivial test (found seeded bad numbers) worked “OK’. Must be perfect. Ship it! And Never Look Back once published.” The classic hubris recipe for hidden disaster. Like the iron of the Titanic. The ship was “unsinkable”, except they used too high a carbon content in the steel and some of the fasteners and they cold embrittled so were prone to brittle fracture. Lord save me from folks who “improve” things with complicated bright ideas and never check the complete impact.

    In the FAA they have a saying that “The regulations are written in blood” as they, one at a time, find the “bright ideas” that killed folks. There is no such process in “climate science”. Yet.

    So, IMHO, this is exactly what causes the volatility to go way down AND it explains why the “low going” anomalies get more clipped than the “high” ones. The QA process is broken. Quo custodiet ipso custodes?
    -E.M.Smith ]

  2. G. Franke says:

    It appears that the QA process will also “toss out” the sudden warming resulting from chinook wind events. These events are fairly common from January to March over a huge area (250,000 square miles) from Alberta and Saskatchewan to as far south as Colorado.
    Wikipedia: “The greatest recorded temperature change in 24 hours was caused by Chinook winds on January 15, 1972, in Loma, Montana; the temperature rose from -48°C (-56°F) to 9°C (49°F).”
    My first twenty years were spent in northwestern North Dakota. The chinooks were always a pleasant relief from the cold. Having been in Quality Control/Quality Assurance my entire working career, it dismays me to think that such real events are considered to be an “error” and wiped from the record by a QA program.

  3. Bart van Deenen says:

    Great work!

  4. oldtimer says:

    In your reply to j ferguson above you comment “It’s very hard for folks to learn to think that perhaps they might be wrong. This leads to things like the complete and utter lack of a software testing and qualification suite and / or acceptance criteria.”

    That is the point I have made about the apparent absence of, and the need for, a parallel run of the baseline 1961-1990 anomaly data vs the post 1990 anomaly data published by CRU. It is standard practice in the commercial world. You must get it right. And if you are dealing with money it must be exact.

    The other day I posted this on the Bishop Hill blog
    “The thermometer count changes revealed in the unadjusted GHCN data look to me like a significant process change. In the commercial world, such a change would not be made without a parallel run of the old and new processes to validate the new process. For example a retailer changing his cash machine system would like to be sure that when a customer bought two items, one at £1 and another at £2 and tendered a £5 note in payment, the system flagged up £2 change to be returned to the customer. He does not want the answer to be £1.50 (and be thought a fraud) or £2.50 (and be thought a fool). This is not so much a scientific point as a simple bean counters point.”

    In essence your software testing is my parallel run.

  5. oldtimer says:

    PS to ruin your day. Try this link:

    REPLY: [ You know, it’s things like that which make the Maya “End of Days” scenario look more attractive ;-) -E.M.Smith ]

  6. j ferguson says:

    Back to the missing spikes – (chinook records for example) post 1990.

    I like Oldtimer’s suggestion that the pre 1990 data be re-run, but pre-1990 appears not to have received the same massage.

    I suppose the “raw” data are no longer readily accessible to performing the same massage on the 1963-1990 as on the 1990 forward to produce the pre-1990 “un-adjusted” by the same methods as the post 1990 “un-adjusted”

    Maybe with the money these guys are getting, they could leave one satellite on the ground and reassemble the raw data and make it accessible to everyone. Not as glamorous but maybe equally useful.

    REPLY [ The ASOS also became available at a point in time; so the QA step that uses them as Procrustean Bed Thermometers can not be done prior to that point in time. This, IMHO, is a Big Clue as to which of the QA steps is the bigger “culprit”… IMHO we will find that it’s the bit that puts the automated gear at Airports in charge of the data as gatekeepers that’s screwing the pooch.

    Think about it. You have an automated process that requires the lows to match places with known heat island effects and that will replace them with a computed average as needed. Who would ever detect that the “daily lows” were replaced? And I’d speculated that to the extent it is detected, it is as a report saying “Station FOO” had a higher than “normal failure rate” with data “needing to be improved by the QA process” and leading to the conclusion that station FOO ought to be dropped(!) as it is a “bad” station!

    This would lead to the recursive decent into hell we have as countries all over the place asymptotically approach the MEAN low of the Airports as their MIN. It could also explain the seasonal asymmetry (dropping more cold months lately) to the “Dropped Months” that VJones has found; along with the ever increasing number of missing months. As the dailies that are ‘accepted’ become closer to the MEAN low, ever more natural daily events become “bad data” and are left out. At some point the dailies have too few survivors to make a monthly MEAN and the station has a ‘dropped month’ due to “failure to pass QA” – and who could complain about failure to pass QA?… Speculative, yes, but a giant “Dig Here!”

    It ought not to take a giant effort to sort this out and find if there is a problem or not. If the QA software makes log files, they would tell you how much “fudge” was in the product. It would not be hard to add such log files if they are not in the code today. One programmer week would be more than enough to get an answer. If you don’t have access to the code, one person digging into the dailies from the original forms for selected winter months from a few dozen non-ASOS stations ought to be able to compute a “MEAN” and compare it to the GHCN MEAN fairly easily. IFF there is a major fudging going on, a sample of a few dozen ought to find it quickly. So take a few dozen “cold places” with rural stations near but not at Airports and compare their post 1990 Original Reports based mean with the GHCN. A week or two of work? It would only be problematic if the “Dailies” available for inspection on line were also “corrected”… Frankly, I think one of the biggest issues now will be just FINDING a set of non-ASOS stations that are still in the GHCN. We’re up to 90%+ in some countries all over the world…
    -E.M.Smith ]

  7. CoRev says:

    Raw Data!!! We don’t need no stinkin Raw Data. Who would know what to do with it? Why would anyone question the experts (us that is)?

    Great job! I’ve been following along, and your work confirms many of the questions re: how much of the registered warming is natural/man caused, versus artifacts of the technologies and handling.

    Now can we get to the real raw data to try to determine the reality of the warming and perhaps some measure of the causes?

  8. j ferguson says:

    E.M. please forgive me if this is a well trod path, but….

    Do you have the QA code they used?

    If not, can we request it? of whom?

    Somehow a lot of what they appear to have done in rejecting records is based on assuming that an event record could be noise, when in fact it is signal. Since I’ve tried to understand this stuff, I’ve been amused by the concept of picking signal out of the noise – or trend out of the noise when there really is no noise, only greatly varying signals.

    but this issue may be more a product of my naive understanding.

    one of the highlights of last week, for me, was a remark on Jeff Id’s site recognizing that noise could have a very long period. The “noise” in electronic activities where I’ve been concerned with it, has usually not been low-frequency.

    But then long period “noise” might actually be signal and be periodic on a longer time frame than you are looking for and so seem a trend instead.

    Likely there are publications on discerning acceptable temperature signals among the unacceptable.

  9. Richard says:

    So how does this ‘bullseye’ in the temperature record occur for all (most?) thermometers?

    Effectively, what the dT/dt calculation and graphs appears to have shown (difficult to see on a ‘normal’ graph) is that there is one year in each thermometer record where exactly the same climate pattern occured in one year as the previous year!

    Exactly the same for each month of one year matching the same month a year on for a whole year! That shows up in your graphs as the ‘bullseye’, where all months dt = 0 for a whole year.

    This seems to be statistically unlikely surely?

    More likely to be a data processing error where the bounds of an update under/overflowed its expected bounds and thus acidentally overwrote one whole year in a ‘raw’ record? Something that is seen all the times in software programs as an typical error.

    Definitely a place to dig further!

    REPLY: [ It’s actually a bit more mundane than that. It’s not an error, it’s “by design”. The “dT/dt process” is a variation on “First Differences”. You take any given value and compare it to the next value and find the “difference”. So if it’s 10 C in January and 12 C in February you find the “first difference” is 2 C (that 10 – 12 = 2 ).

    Well, there is an artifact in this process. What do you do with the very first month of data? Is January compared with what? The formal answer is “nothing”, but you could also think of it as comparing January to itself. So when you get a new thermometer, the very first record is always all zeros.

    Now you have your new thermometer, and 20 years later you replace it. What do you do? Yup. A new line of zeros as that new thermometer starts life.

    So what the “bullseye” is telling you is that the thermometer record takes some kind of a ‘reset’ for all the thermometers in that group.

    I’ve looked into it, and the “reset” is in the “Duplicate Number”. That “Duplicate Number” says that something about the data processing was changed. (IMHO it’s the changed “QA processes” that were started then, but this needs confirmation).

    So what the Bullseye is saying is not that there was some hardware or software breakage, but rather that the PROCESS was changed and we got a reset. The suppression of the ‘cold going peaks” tells us that the process change had a differential impact on cold anomalies; and the dramatic ramp up (often 1 C per DECADE) says that the impact was quite large and way out of scope of any natural change. This tends to confirm the process change as the “problem”. (For the simple reason that if some countries are warming at 1 C / Decade while their neighbors are flat or cooling, we would need to have dramatic weather anomalies that are just not being seen.)

    So your instincts are quite right (something is seriously screwed up) but it’s the process not the hardware or a bug in the software. Basically, this screwage is by design at the “Duplicate Number” level… and it’s the design (most likely of the “QA process”) that’s borken… ;-)
    -E.M.Smith ]

  10. E.M.Smith says:

    I have not yet been able to find any NCDC code for inspection. I have a vague memory of a discussion a year or two ago on some blog (WUWT?) where it was asserted that NCDC did not make the code available (but that could be a misstatement of an offhand comment by some random…)

    The folks to pester ought to be NCDC as they are the ones who collect the dailies and turn them into Monthly GHCN. For some places, like Australia, England and New Zealand, that may be done by the local BOM / NMS prior to delivery to NCDC.

    I’m pondering ways to detect the effects of the processing even without the code… Things like a “differential of ASOS vs non-ASOS” stations. They ought to approximate over time if the ASOS are controlling destiny… So plot some “non-ASOS” vs some “ASOS – AWS” and watch for convergence, especially in stations at higher elevations and latitudes and with a stronger signal in winter months. That, IMHO, ought to put the spotlight on the burglar…

    Per noise and signal: Yeah, that’s a fascinating area. Especially when you consider things like Bond Events that are KNOWN real physical events with a 5000 year period… and the 178-210 or so cyclical solar events… Then you toss in the “noise” from things like volcanic events that can have a slow ramp up/down over hundreds of years… (Hawaii once took a very long volcano break. Right when I went over a few times trying to catch a view of a volcano. I gave up and have not gone back. It’s been erupting nearly continuously for decades since. Sigh. Even ate the museum I visited. That is “noise” measured in the length of my lifetime. I would book a trip to go see it, but I’m certain that would stop the eruption…

    So, IMHO, we have folks who are sure they can figure out what’s noise and what’s signal and in reality have no idea what’s noise and what’s signal. So we get this compressed pap with a lot of the signal ironed out of it and the noise spread around. (Cold excursions removed and UHI spread). We’re taking out short period signal and putting in long period noise. THEN finding a long period “signal” that isn’t.

    @CoRev: Every time I think I’ve found the “Raw” data I’ve been sorely abused by it… At this point I’m beginning to wonder if we even HAVE raw data any more… In the days of paper records we had the reporting sheets, but today? With reports of them molding away and with records retention policies that toss our history on the trash heap, what’s left?

    Why keep the “junky input” when you have the “QA value added” on tape or disk? That’s what the CRU crew did. “Lost their raw data”. Heck, in many automated systems the intermediate values are never kept; so I could easily see the electronics reporting a value to a computer that “QA’s it” and replaces it with a computed average if it doesn’t “fit the bed” and then just passes that to the data set as the ‘daily value’ (and a week later the log file rotates and no record then exists that the value was replaced…)

    IMHO, it is highly likely that we do not have raw data from the 1990 to date period for 80%+ of the records in GHCN.

    The good news is that we can simple stick a real thermometer in a Stevenson Screen some place, and get the present state. Then compare that to the pre-1990 data and ask if we are warmer or cooler than then.

    Just bypass the whole lot of “climate scientists” and their shenanigans.

    Pick some places with decent historical records. Validate the present temperatures independently and with full raw data audits and retention. Show “no net gain” since “long ago”. Then tell them to go stuff it.

    I’d suggest about 50 to 100 stations scattered around the globe. Comparing only “station to itself”. None of this “grid / box / averaging / extrapolating / interpolating” garbage.

    If you have that, the rest of it just becomes a portmortem.

    Procrustes meet Theseus.

  11. Steven Schuman says:

    I know little about statistics. Nevetheless I could see how cuttting off the lower temperature excursions could result in a one time step up in the data set. How does it change the slope? Thanks again for all time you’ve put in.

    REPLY: [ It’s all about averaging and getting the magnifying glass really really close. Take a look at the graph for a site. Say this one for Strasbourg

    Notice how the bottom edge of the data pulls away from the zero line at the right edge? THAT is the “giant rise” we get out of the dT/dt charts and the giant rise found by codes like GIStemp. Doesn’t look nearly so impressive from far away, does it?

    So the first thing to realize about “the rise” is that it’s really pretty dinky when seen “in context” (the context being the full range of time and temperatures). We have to get out a big magnifying glass to find it… It was there, in this temp graph, for all to see, be we didn’t notice as the scale of it is sooo small…

    Now look at the tops of that data. It looks like it MIGHT rise a little at the end, but if it does, it is rising far less. ( And not out of line with what it did “at the start of time”, where the bottoms are clearly out of line.) OK, we saw the same thing in the aggregate graphs for France for temps (but under a large magnifier) where we had winter temps rising, but other times of the year “not so much”. Yet the AVERAGE line rose.

    So, to get a rising TREND out of rising BOTTOMS we have to average it in with all the other data. That gives us the Hot Pink rising trend line for France “over all”.

    So those two give us most of the rising trend line out of an “eroding bottoms”. You can see a stellar example of this in the graph of Antarctica. The tops hold steady and the bottoms erode dramatically, leaving you with a rising trend line.

    Since anomalies of annual data are for the purpose of finding trend, they will find this rising trend and display it, not the approach of the bottoms to the tops. If you put a “guide line” along the top peaks and another along the bottom peaks you would see a wedge getting dramatically narrower to the right hand side, mostly from the bottom line angling up. (This is common in stock charting programs, but I don’t have it coded yet for my stuff. Those kinds of ‘peak following trend lines’ are very very useful for spotting real trend vs range compression from one side.

    Finally, for any one station, there is only so much “lift” you can get. If you range from, oh, 2 C to 8 C it will be darned hard to get even 4 C out of it without someone noticing that January Snow is happening when the winter temps match May! So at that point, the “Splice Artifact” is an essential component. Each station contributes to the “rise” but during different parts of time. A beauty of an example is in the Marble Bar posting where a series of “boom towns” get merged into one long rising record for the “area”. I take each chart individually, show their rise, then show how merging them all into one anomaly series gives a long steady rise that is not present in any one station. “Station Dropouts Matter. -E.M.Smith” and they especially matter in that they facilitate “The Splice”.

    So that’s the way it’s done. You only need about 1/2 C to 1 C of “rise”. Getting 1/2 C out of an approximation of highs and lows isn’t hard. Taking a dozen of them and splicing to get a total of 2 C isn’t all that hard either. Then put it under a giant magnifier so instead of being nearly impossible to see in 1 mm of graph it is a 10 cm rise over the whole page and, viola! “Catastrophic Global Warming!!!” But…

    “Don’t try this at home. Global Warming provided by professional Climate Scientists on Closed Course. Attempting to do this at home may result in failure to find Global Warming with graphs displaying blatant splice artifacts and high/low approximations. Your milage may vary.”…

    -E.M.Smith “

  12. Rod Smith says:

    @G. Franke: You are absolutely correct about the Chinook winds as well as their relatives elsewhere. Watching a thermograph during such events is quite educational even to weathermen. I have always suspected that frontal passage temperature changes are also ignored.

  13. E.M.Smith says:

    @Rod Smith:

    Well now you have confirmation that frontal passages “have issues” complete with the maximum rate of change they can manifest before the Procrustean Thermometers will chop them off and replace them with an “average”…

    “It think that’s gonna leave a mark”…

Comments are closed.