(h/t Verity Jones for this graph)
The general point being covered here is that the “Average Climate” is a much smaller range that the Record Ever temperatures, we all get that, but that also the “Average Climate” is much smaller than the ranges of temperatures during quite normal long term (and fairly cyclical) changes such as the PDO cycle (or the AMO or AO for Europe).
The various climate codes use the temperature “mean” as a proxy for the “Average Climate” in a variety of data creation ( “fill-in” ) and “homogenization” codes. Yet that number can be quite different from the actual value of temperatures, and those values may persist for decades during long cycle events. Further more, the relationship between two cities can be quite different during those long cycle events. The shape and position of the MIN and MAX temperatures can be significantly different between two locations during extremes than they are “on average”. This means the “Average Climate” (or “average mean”) can fail to track what is actually happening, perhaps for decades. That is, there can be a significant “tracking error” between a temperature average mean and the correct values to use for data “fill-in” or for “grid / box” anomaly creation.
Volatility Matters. “When” It’s Volatile Matters More
OK, this posting is about a fundamentally simple thing, but will likely end up being a complex posting. I’ll do what I can to keep it easy to follow.
The basic point is about “Volatility”. The tendency of things to change from their norm or average to greater or lesser degrees. Some places have more volatile temperatures than others (high deserts, for example) and some stocks can be much more volatile than others (small companies with no earnings, but a good story; as compared to a utility company). When you hear traders talking about trades you often hear things like “selling volatility” and “buying volatility”; but rarely do you hear “climate scientists” mention it. There are a large number of common behaviours between these two fields (stocks and temperature series) and they both have a fractal nature; so I found it odd that volatility was ignored. Especially when it comes to adding or dropping stations.
Over time, the stations that are dropped from the record are largely the more volatile ones. The stations that are kept are the less volatile; the ones at lower latitudes, lower altitudes, and closer to water. This will have an impact.
Metaphorically, we have moved from high tech Start-Ups to staid old Utilities.
But What About The Timing?
This is where it gets interesting. When a stock trend is “going your way” you want to “buy volatility” as you get more juice for your buck. So if you wanted a cold biased period in the record, you would hold high volatility thermometers during, say, a cold phase of the PDO. Oddly, we have a large number of such stations during the peak of thermometer counts from 1950 to 1990. Then if you wanted to have a high reading that was ‘locked in’ (or a stock gain that was locked in) you would start dumping those high volatility thermometers (or stocks) when you were nearing a likely peak. That way, when the inevitable ‘fall’ happens, your readings (or your money) do not drop as much with it.
If you are in a volatile stock in a rising market, you can make 2 or 3 times as much money. If you are in a volatile stock in a down market, you can lose 2 or 3 times as much. For stocks, this property is called the “Beta” of the stock. The tendency to go up in price without respect to volatility is called the “Alpha”. So for thermometers, a warming thermometer would have a high “Alpha” and a thermometer warming more than most during warming times, but cooling more than most during cooling times, would have a high “Beta”. AGW “climate science” is all about the Alpha, but ignores the Beta. (A cynic might say they ignore the Beta publicly but perhapse not in private…)
Stock traders regularly try to get both high Alpha and high Beta when they think the market is headed higher, but then swap to low Beta stocks if the trend looks like it’s going to reverse. In this way you can ratchet up your position over time. You splice the gains on a high Alpha play in the up periods onto the smaller losses of a low Beta play in the down or flat periods (thus reducing the risk). The result is a sort of a ‘stair steps up’ effect (though sometimes with a bit of a rolling wave imposed on it). Just take a SINE wave, scale it by 1.25 on the up 1/2 cycle and by 0.75 on the down 1/2 cycle.
So did the temperature guys do something like this? I think they did. We had a ‘cold phase’ of the PDO during the period when we had the most thermometers, and had the most volatile ones. That will build in a ‘cold bias’ via the increased volatility to the cold side during that time. These volatile thermometers were kept in the series as the PDO shifted to a warm phase, putting in a large gain. Then about 1990-2006, at the peak of the hot side of the PDO, the GHCN (Global Historical Climate Network) thermometers were then reduced in numbers, but also preferentially reduced in volatility. We lost the high cold places and the inland places in California, for example, but kept 4 thermometers. One in San Francisco, and three near the beach down in the Greater L.A. Basin. (Santa Maria, L.A., and San Diego). All very flat profile very low volatility places.
So having established a very cold base from high volatility excursions during a cold PDO phase, then ridden that volatility up during a warming turn, we now “Lock in our gains” as the hot market starts to cool off by swapping to low volatility thermometers that can never reach that cold a level. We “get small” in trader terms (hold fewer positions) and we “sell volatility” (get rid of the volatile positions).
In Graphs – Reno, Sacramento, San Francisco
I have chosen these three as they are very close together (at least in GIStemp terms where 1000 km is ‘close enough’) and because I have interesting data for them. They are also more or less on the same latitude line, so we can see the impact of altitude and distance from water separately from latitude effects. There is a fascinating weather book “The USA Today Weather Almanac” (I have the 1995 version, but have not found a newer one). It includes charts for various cities with the Average High, Average Low, Record High, and Record Low. I read the data off the graphs (so it’s probably about 1 F accuracy) and used it to make my own tables, and from them the following graphs. For a formal paper, a better data source ought to be used, but this is good enough for illustration. A “dig here” would be doing the same thing by latitude and looking specifically at altitude without the distance to water involvement and tease out the relative strengths of each. For now, this example will do.
Notice that Reno has a fairly broad range of temperatures and that low records can be quite far from the average low.
Here we see that Sacramento is “flatter” than Reno.
The interesting thing about San Francisco is how the summer Records can be far higher than the average. We have both colder cold excursions up in Reno and warmer warm excursions in San Francisco. Plenty to work with here…
Now imagine a case where all three of these are in the temperature record during the period from 1950 to 1970. We would be getting cold temperatures much closer to those Record LOW lines on each graph. Not quite to them, as those are records, but closer to them than would be indicated by that “Average MIN” line on each graph. Then we would rise into a warming period, showing a very rapid warming as we moved from below those “Average MIN” lines through them, and to a place higher than the “Average MIN” for low readings. High readings would have a similar behaviour. Below the “Average MAX” during the cold phase, then above the “Average MAX” and closer to the Record Max during a warming phase.
We would have ridden our “High Beta” Reno and Sacramento to a greater “warming trend” from very cold to quite warm and we would be getting ready to ride them back down again on the next (present) cold turn of the PDO. Except we’re going to take those high volatility thermometers out of the record as we approach and pass that top of the warming. Some, starting about 1990, and the rest in 2006. At this point we have only San Francisco left. We can never have a ‘near Record Low’ reading from Reno to pull our “trend” back down to the prior lows.
OK, but The Reference Station Method can just fill in the missing data, can’t it? Well, it can fill in SOMETHING, but it will be using the relationship between the “MEAN” of the two locations, not the more extreme ranges. It will be filling in a more “typical” value, closer to the mean, and not down near a low extreme excursion. Averages and means are like that. They dampen volatility.
Same Graphs, with “Mean” added
So we’re going to take that “Average MAX” and the “Average MIN” and make an “Average MEAN” out of them. That’s very similar to the value you find being used in climatology to stand in for any given station. The mean. Also on these graphs is a line for 1/2 way from the Average MIN and Average MAX to the RECORD excursion line. This is a proxy for what the temperature is likely to be during a warm (for MAX) or cold (for MIN) excursion.
I’ve changed the color coding on these lines so it’s easier to see the cold to hot relationships. The middle green line is the “Average Mean” and something like that would be used for recreating missing values in some climate codes.
Comparing Reno to San Francisco
Ok, what can we do with these graphs? Lets compare Reno to San Francisco.
This graph has the difference between the Reno and San Francisco values on it. So the “Average Mean” of Reno compared to SF could be used to adjust the San Francisco value to create a ‘fill-in” Reno value (if Reno were missing). To show that, we subtract one Average Mean from the other Average Mean to get the average offset between the two locations. That is the dark green line. We can use that to take San Francisco and adjust it by that amount to get a ‘typical’ temperature for Reno, given the relationship between the two over a long term average.
But we know that during a cold excursion, Reno cools off much more than San Francisco. So we have the difference between the Reno and San Francisco “Average Mins” as the dark blue line, while the lighter blue line compares the Reno and San Francisco “50% low excursion” lines. It shows how much colder Reno is during a cold excursion. Sine the Average Mean was calculated using the Average MINs, we know that the difference between those “50% Excursion MINS’ and the Average MINs is an error term in our “Average Mean” during a cold excursion. Basically, we use the “Average MIN” to find our relationship between San Francisco and Reno, but during a cold turn of the PDO we’re more like that light blue line for several decades in terms of the relationship of the MINs.
We can use the difference between those two blue lines as a rough estimate or approximation of the error present in the use of the “Average Mean” to create ‘fill-in’ values during this time.
(In reality, the MAX will not drop in exactly the same manner as the MIN, so the shape of the “50% Cold Excursion MAX” line would be a bit different from the “50% Cold Excursion Min” line, and that would slightly change the shape of the ‘error’ curve; but this is ‘good enough’ to demonstrate the issue).
So the difference between these two (the ‘Average MIN’ and the ‘50% of Record Excursion MIN’) is the Magenta line. That represents the degree to which the “Average MEAN” will under represent the actual temperatures in Reno if used to create them from San Francisco temperatures during a cold period. Notice that it has a range from about 10 F to 20 F. That is a LOT of error. Even if we mitigated 90% of it by other means, we would still have enough tracking error to account for “Global Warming”.
If we then subtract that magenta “tracking error” from the dark green line, we get the light green line, which is more nearly the offset that ought to be used to create “fill in” data from San Francisco during the time of a cold excursion.
The point here is that you can not use an overall mean (a mean from all history of a thermometer) nor even a randomly selected segment mean to do ‘fill-in’ creation or other thermometer adjustments. You must pay attention to the differential volatility of a given thermometer pair during both hot and cold excursions to avoid having this kind of tracking error introduced into any ‘corrections” or “adjustments” made using a simple average or mean value.
If we had enough of these ‘tracking errors’ as we returned to a cold excursion (say, as the PDO turned cold and 5/6 th of the GHCN thermometers are dropped), we could easily have the “fill-in” temperatures be way too warm, such that we could never have an average of them get anywhere near as low as the “average temperature” had been in the past.
There is also a second place this error can cause problems. When making a “Grid / Box” anomaly. If we compare a box of thermometers with high volatility during a cold phase to a different box of thermometers that are low volatility during a later cold phase, we will find a spurious warming signal. Even if we adjusted those thermometers based on the “Average Mean” difference between them. That later “box B” can never get as low as the original “box A” and an adjustment or correction based on an overall mean will have a warming bias introduced due to the tracking error.
We could do a very similar process for the MAX side to get a Hot Excursions tracking error and a Hot Excursions corrected mean. In that case we would find that during hot times, San Francisco actual MAX temperatures would provide a way too hot ‘base’ for creating “fill-in” values elsewhere. (That extra height of the Record MAX line above the Average MAX on the San Francisco graph.)
Or, we could continue to use the “Average Mean” and let stations come and go from time to time, and just accept the errors. (If we were really creative, we could even arrange it so that stations were used as needed to move the “Global Average Temperature” as desired.) This is largely what the climate codes do. At a minimum, they are error prone and have an “accidental” bias to warming tracking errors.
Next we’ll look at the “80% of Record Range” graphs. I’m going to change the colors again just to make it clear that these are NOT the same graphs. (They are a different range, and they have added lines on them, so all red and blue would make it very hard to pick out individual lines.)
The 80% of Record Excursions View
These charts are a bit “busy”. First off, they have the same 4 basic temperature lines (Ave MIN, Ave MAX, Record MIN, Record MAX) lines from above. Then they have the “80% of the way to the records” lines just near the records (and inside of them toward the mean). Finally, we have the “Average Mean” of the average Min and average Max. This is, roughly, what is supposed to be used by codes The Reference Station Method to recreate one city temperature from another city. But what if that average was not made during the WHOLE cycle of a PDO. What if it were made during a cold or a hot phase, then USED in the other? How much would our adjustment be off? For that we have 2 more lines. The difference between the Ave-Min and Ave-Max lines and their “80% of record” lines. Again, it would be great fun to see how little of a range is needed ( 20% of record?) to get the 1/2 C we are supposed to be all panicky about. I may do that later after some tea…
For now, here are the graphs with a bit of a wide 80% range ‘for illustration”:
Look at the Light Blue and Magenta lines. Notice that we have 10s of degrees to work with, both to the hot side and the cold side. (Note that these are NOT the same as the “tracking error” line in the final chart above. That was comparing two different cities. This is the offset between the Average MIN and the 80% of Record MIN. (Or MAX). With that much “juice” I could cut this down by a factor of 10 and still have a whole degree of F each way. Just about that 1/2 C, but in either direction. OK, we now have everything needed to make the “Global Average Temperature” anything we want. Select stations for presence during volatile periods in the direction we want to go, then remove them when the trend shifts. Use The Reference Station Method to make up the ‘missing bits’ via a biased “average” from a period who’s volatility is not representative of the moment where the data are missing. Use a “Grid / Box” anomaly calculation that compares one period of volatile thermometers with another period of non-volatile thermometers. Accumulate all the tracking errors and “Voila!” AGW. Just not caused by CO2.
Less total range to work with than for Reno, but still plenty. As they say, No Problem. The Magenta and Light Blue lines show plenty of tracking error between the Average MIN/MAX and the 80% of Record MIN/MAX. Even if this case does not arise often, it could easily skew a result.
Even from comparatively dead flat San Francisco, we have workable ranges. But in particular, look how big the ‘hot side’ excursions go! We’ve got 20 F to work with here. So ‘Reno in’ during the cold phases, then take the real Reno out during a hot phase and just ‘make it up from SF’ during the hot phase and accumulate some of that error into the created value for Reno. Heck, we could likely get near 40 F of “swing” all told. Plenty of “juice” here to let us be very subtle about it and only use a little bit “as needed”.
The point here is that there is plenty of potential tracking error between the averages during “normal” times and the averages during “record” times (and by implication, the non-record but non-average times too) for errors to accumulate into the processes that use various averages of periods of time to fabricate “fill-in” data, to homogenize data, and to compare “Grid / Boxes” over a 30 year gap with different volatility of thermometers in each box. And that error is strongly weighted in the direction of “warming trend” due to the dropping of volatile thermometers as the PDO moved toward the top of a warm cycle.
Comparing the Three Cities
Next, I’m going to compare the three cities. We will not be showing all the lines from the above graphs. We will be looking at the comparison of the two cities “Average Means” vs the two cities “80% of Record MIN” and “80% of Record MAX” comparison. That is, what would be the “offset” in average times vs what would be the offset in “80% of Record” times. Finally, we subtract each of those hot and cold lines from the green line to show the potential impact of the change on the “offset” between the two cities. How much that green line might be in error in a given year.
First off, let’s compare San Francisco with Reno. We have the difference in the “Average Mean” for the two cities (the Green line) and the differences in the two “80% of MIN” and “80% of MAX” records for the two cities (The deep blue and deep red “Hot” and “Cold” lines). Looking at those three lines you can see how much the shape of the lines change at extremes when compared to the “typical” average case. (Compare the shape of the dark green, dark red, and deep blue).
This first comparison of cities is the easiest of the batch to figure out. It also has the two cities that are most ‘not alike’ and with the strongest divergence. OK, first off, look at the two lines with very bright and light colors. The “Hot Pink” and the Yellow. They make one concept. The hot and cold tracking errors. There is also a Green line, that’s the comparison of the two cities Min-Max averages. (Similar to what is likely to be used for the creation of a missing datum most of the time in climate codes). The “Hot Pink” (or Magenta) line and the Yellow line show the overall tracking error between that green line (the “Average Means” comparison) and the “80% of Records” comparisons. The difference between the green line and the dark blue and dark red lines.
The remaining two lines are darker colors and they are the comparison of the two cities in the extreme times (The dark red and deep blue). What ought to be used during hot, and cold, phases of the PDO respectively, instead of the “Average MIN” or “Average MAX” in computing an offset. The degree that the two depart from the Min-Max Average (green line) is what makes the two light colors, the yellow and magenta or Hot Pink. Those two are the ‘tracking error’ you would get from using the Min-Max Average instead of the 80% Excursions Average during various times of excursions.
(Not present on the chart, simply because I don’t have the data, are the other half of the records that ought to be used to compute a “Hot Cycle mean” and a “Cold Cycle Mean”. So we have the “Cold Record”, but not the “MAX on the Cold Record MIN day”. I’m sure it’s available somewhere, but I don’t have it to hand. So we are going to assume that the MAX on a record MIN day is not abnormally far away from the MIN and certainly not enough to erase all the tracking error.)
So we can see that the dark red and the green are not too far from each other during summer. July in particular is nearly zero tracking error between the hot phase line (red) and the Min-Max Average line (green) so the Yellow difference goes to near zero. But it’s that deep blue line that’s the most interesting. Look how much they diverge during the cold phase of the cycle! And how much tracking error.
(Realize this is the tracking error in 1/2 of the set that is averaged to make the “mean”. So if “Min” tracks 5 points off, that would be averaged with Max to get the mean. Though one would expect Max to be low when Min is unusually low, so I’m going to treat it as a proxy for total error of the calculated mean. In a formal paper this would need to be elaborated).
That hot pink line is way up there about 20 F. So if we calibrate our Reference Station Method offset during an ordinary Min-Max Average period, then use data to recreate a Reno temperature from a San Francisco temperature during a cold phase of the cycle, we could be up to 20 F too hot in our estimate. (For a trivial version of the reference station method. The actual version averages a bunch of stations together and weights them while spiraling outward to greater distances. Basically, it will blend a bit of this 20 F with some 10 F and some 2 F and… But this shows what’s available for sausage making from San Francisco for Reno. It also may use different periods for the average that is used for blending. Again, we’re just showing the extreme that is available to use, not the actual amount that will be used for any one thermometer. But I do think this explains many of the “odd” variations in adjustments done when using averages that vary in time for each station as the code runs over the data.)
OK, the colors stay the same in these next graphs. Next, lets compare Sacramento to Reno.
Sacramento is inland about 60 miles form San Francisco but still at near sea level (they have a deep water port at the end of a long shipping canal). So we ought to see much more the impact of altitude in this set, less of the ocean influence. And where San Francisco gets fog in summer as it it pulled in from the ocean, Sacramento gets “tulle fog” in winters that can block the sun for days. Reno tends to have much more clear sky due to being behind the Sierra Nevada mountains in the high desert basin.
Very interesting shapes. Again we have the “hot phase” 80% of Record being rather near to the “Average Min-Max” phase. We have little tracking error (the yellow line) other than in winter (when I’d speculate Reno has clear skies in a cold desert and Sacramento is under a fog blanket).
But once again, it’s the cold phase that’s most ‘useful’. We’ve got a 10 to 20 F tracking error in that Hot Pink line for most of the year, only ‘going a bit limp’ in spring. When it’s cold and wet in both places, well, it’s cold and wet about the same as in average years, with “only” 5 F of error to work with. Also noteworthy is the fact that the blue line says that during a very cold phase Reno can get nearly 35 F colder than Sacramento.
So how about a comparison of San Francisco to Sacramento? Both sea level, and very near each other. About 1/10th the maximum distance GIStemp will use, so they ought to be a near perfect match. In GISS Theory…
Well, at first glance it doesn’t look like much. But there is still a story here. The Green line is the comparison of Average years. The red “hot years” line pretty much stays on top of it, but does have a tiny bit of ‘lift’.
The tracking error seen on the yellow line shows that in March and November, for example, we would be subtracting 5 F to 7 F too little from SF to create a Sacramento temperature.
March. November. Hmmm… I’ve seen that pattern before. Spring and Fall having more ‘divergence’ in their Hair Graphs for some collections of thermometers than winter; and with summers almost a match… like that yellow line near zero. Well, more to investigate, but hints of seeing evidence; at least enough to hint at a causality for those patterns in the dT/dt “hair graphs”.
Now look at the deep blue line. Significantly different position, and somewhat different shape from the green line. The resultant “hot pink” tracking error line is a modestly robust 10 F or so over much of the year. It does fade quite low in the dead of winter. (When it’s darned cold in both Sacramento and San Francisco, especially when the Canada Express manages to flood down over both of them). The mid-summer “dip” in the tracking error matches a rise in the blue line mid summer. Even in cold periods the Central Valley can get hot in July and August.
I do find it interesting that this line matches, more or less, my experience the last couple of years. A ‘very cold” spring compared to “average”; about August it’s nearly normal, just a bit cool; then we head into winter and November is just a bear. So this pattern is reflecting what I remember of the last cold PDO phase and what I’m experiencing now in this newest cold phase
Conclusion
OK, this is the “issue” that I’d hinted about back during the Spherical Cow posting days. I’ve spent a while sitting on it, and I’m no closer to making a formal published paper out of it; as life requires that I make money for a living, so it’s time to let other folks have a go at it. It think there is plenty here to get published, but it would need more work with better data and a large sample size of stations. If someone does take on that task, a footnote of mention would be appreciated.
10 F of tracking error in these charts is interesting, but it’s just way more than is “needed” to create an “AGW signal” that is in keeping with the 1/2 C that is touted. If we have 6000 stations dropped and 1000 kept, and need all of 1/2 C, (call it about 1 F) then all we really ought to need is about 1 – 2 F at most from about 1/2 of those 6000 stations. By my estimate, I’ve got about a factor of 10 of attenuation to introduce before I get the “desired” size… An “embarrassment of riches” of a sort. Folks could easily have been seduced by the “perfection” of their correction techniques and not noticed that they were 10% shy of perfect.
Also of note is that it is not just The Reference Station Method that will fail due to this effect. Any of the “Grid / Box” anomaly creation codes that use one set of thermometers in the past and a different set in the present will have a similar failure. A box of ‘cold excursion’ thermometers in the past, compared to a set of ‘warm excursion” thermometers now, when the cold set is more volatile down, and the present set is locked in warm low volatility, will also have a spurious warming offset. That’s what you get when you do your anomalies at the end, as GIStemp does. And that is why both the dT/dt method and the simple “average of all data for a given thermometer in a given month vs a given reading” methods both give similar (and, IMHO, more accurate and trustworthy) results. They avoid this error by only comparing a given place to itself, and always comparing it to a representative set of times. “Nearly Now” for very short lived records, and ‘full cycles’ for very long lived records. Never “a bit now” vs “a cold volatility enhanced sample of something” in another time. That some codes “double dip” with both a Reference Station Method data fabrication AND a “Box of Cold Excursion thermometers vs Warm and Flat set now” just makes them that more likely to “get it wrong”. And they do.
In my opinion, a large part of why they get it wrong is that they ignore the impact of location volatility on their processes. I’ve also noticed a bias toward wanting more “stable” thermometers in some writings about the goals of thermometers for climate research. As though removing the volatile thermometers from the present record would give more accurate and more stable results. To the extent folks believed that, they will have introduced an error when comparing to a volatile set of thermometers in the past. The motive may have been pure, but the result will be an added error term due to uncompensated volatility change in the aggregate set.
Another way to validate your theory is to add back the high latitude and high altitude stations back to GHCN data sets and re-compute the global temperature, assuming the data for the dropped stations exist somewhere.
I think you may have hit upon the “missing link” of what has been happening to the temperature record. I would encourage you to try to work on a paper please at least in your spare time.
Well, after the re-write I think it’s a bit clearer. Though the lack of comments has me wondering if it is still a big ‘dense’ or if folks just look at the length and move on. It’s nice to know that at least a couple of folks have read it through and “get it”.
I’d love to get a “paper” out of this, but there is a lot to do and I’m just not getting enough time on this issue. If my “day job” was professor somewhere I could do it; but since I have to make my own living by my own trading each day, this has to fit into “spare time” and that is in short supply. So what I want doesn’t matter much… Maybe some kind sole will write it up, do the detail work, and put me on as second author ;-)
At any rate, with the idea “out there” folks can now start kicking it around and see what merit it has. While I think it’s important, only time will tell.
I think it’s a great article EM, but w/ 2 kids and 2 jobs it’s hard to fit this in. I’m hoping to have a day over the labor day weekend to go over some of your more recent posts.