I don’t remember where I picked up the pointer to this paper. I have a vague feeling it was at Watts Up With That… but who knows.
It is a look at the statistical basis for finding “trends” in the USCRN temperature data. In particular, they look at the use of “monthly averages” as a way to find “seasonality” to remove (so an underlying non-seasonal trend can be found) and find it lacking. They also look at the assumption of independence and lack of persistence in the data. The finding there is that the data do have strong indicia of persistence. This means that there is a high probability of finding false trends by the use of the typical Ordinary Least Squares fit. (The typical means used in “climate science” to find trends).
The implication here is that finding trends in the way it is done, in the data from which it is done, will find false trends.
OK, with that, the paper and some other backing material links (bolding mine):
SSRN is the “Social Science Research Network”. I find it interesting that the paper was published there. One presumes the typical “Climate Science” outlets were not interested in work critical of their fantasies…
Seasonality and Dependence in Daily Mean USCRN Temperature
Sonoma State University
April 12, 2016
A study of daily mean temperature data from five USCRN stations in the sample period 1/1/2005-3/31/2016 shows that the seasonal cycle can be captured with significantly greater precision by dividing the year into smaller parts than calendar months. The enhanced precision greatly reduces vestigial patterns in the deseasonalized and detrended residuals. Rescaled Range analysis of the residuals indicates a violation of the independence assumption of OLS regression. The existence of dependence, memory, and persistence in the data is indicated by high values of the Hurst exponent. The results imply that decadal and even multi-decadal OLS trends in USCRN daily mean temperature may be spurious.
Number of Pages in PDF File: 14
Keywords: global warming, climate change, USCRN, OLS trends, Hurst exponent, time series
I was particularly interested in the paper in that it uses several locations where I have “connections”. I grew up near Redding (north of Red Bluff north of Chico, near where I lived) and have driven through it on many a hot summer day on the way from home to Medford, Oregon, where my Aunt and Uncle had a farm. Dad was from near Des Moines Iowa, also in the paper. So it goes…
You can download the whole paper here:
Or hit the abstract link to open the PDF in a browser. (Button in the lower right).
One sidebar note: I downloaded the paper to my tablet. Read it. Then downloaded it to my desktop computer to make this posting. Then attempted to test the “open in browser” link (that downloads to your browser after a ‘do what?’ prompt for my browser). At that third? download, I got a “pop-up” claiming they had detected unusual download activity for my IP address. It then presented a “log in” box. By clicking the “anonymous download” tab, I was back to a normal ‘do what?’ dialog box… But be advised, multiple downloads from one IP (as from a school with a proxy server…) will have only the first couple get through without the manual intervention…
Some teaser bits:
This work is a critical evaluation of the methodology employed in the study of temperature trends. Most of the variance in temperature is contained in the diurnal and seasonal cycles with a very small portion, usually less than 5%, that can be attributed to long term trends (Munshi, A Robust Test for OLS Trends in Daily Temperature Data, 2015). To detect these trends, the short term variances are removed from the data by computing daily means and by removing the seasonal cycle which is described in terms of the twelve months in the Julian calendar (Box, 1994) (Shumway, 2011). The deseasonalized residuals are then examined with OLS regression for trends. It should be noted that the Julian calendar is a man-made device and the use of calendar months to define the seasonal cycle is arbitrary. There is no reason to believe that nature is synchronized with the calendar. Also, the use of OLS regression requires the assumption that the residuals are Gaussian and independent. Small changes in OLS trends minutely examined for clues about climate change may be spurious if these assumptions are violated. The so called Hurst phenomenon of dependence and persistence has been identified in all aspects of nature including surface temperature (Hurst, 1951) (Koutsoyiannis, 2003) (Koutsoyiannis D. , 2002) (Munshi, The Hurst Exponent of Surface Temperature, 2015) (Barnett, 1999).
Basically, the “climate science methodology” assumes independence and non-persistence in the data when it is known to have dependence and persistence. This, as the universal nebulous ‘they’ say, is bad.
The primary research question in this work is whether the temperature data at the USCRN stations contain evidence of dependence and memory. It is important for that purpose to model the seasonal cycle precisely to minimize vestigial seasonal patterns that remain in the data as they may interfere with the test for dependence. Thus, a secondary research question is the length of the optimal unit of time for modeling the seasonal cycle. We use a trial and error procedure starting with the calendar month and halving it to 15-days and then halving it again to 8 days. For each measure of time we carry out a regression of temperature against dummy coded time. The smaller unit of time is considered better if it yields a higher value of Adjusted R-squared without incurring a rise in the fraction of x-variables that are not statistically significant. We also examine the residual plot for vestigial patterns. The Redding station is used somewhat arbitrarily as the test case. The results are summarized in Figure 2. The residual plots from top to bottom are for calendar months, 15 days, and 8 days.
The first “oopsy” they find in the standard “climate science” methodology is in the use of monthly averages. That always bothered me, but for different reasons. (One is related: Seasons change at different months in different places. How can you use ONE month-set for a season in both Phoenix and Calgary? Or Nome…)
It is clear that the reduction in the unit of time from calendar months greatly improves the seasonal cycle model. The vestigial patterns seen in the calendar month model do not appear in the other two models. Also, a significant improvement is made in the Adjusted R-squared value. The choice between the 15-day model and the 8-day model is more difficult but since the 8-day model offers a slightly higher Adjusted R-squared value, we choose to carry out our analysis with the 8-day model.
In short, using monthly averages gives spurious patterns…
A key assumption is that of independence. It implies that the residual for each day in the sample period evolves randomly independent of what came before. In other words, the residuals do not have memory. This assumption can be tested with Rescaled Range analysis (Hurst, 1951) (Mandelbrot-Wallis, 1969) (Koutsoyiannis, 2003). It uses the theoretical relationship that in a truly random and independent Gaussian series, the Range of cumulative deviations from the mean (R) divided by the standard deviation (S) is related to the sample size (N) according to the equation R/S = √N (Hurst, 1951). This relationship can be generalized and written as H = ln(R/S)/ln(N) where H is the Hurst exponent with a theoretical value of H=0.5 for pure Brownian motion or Gaussian noise with no memory (Hurst, 1951) (Mandelbrot-Wallis, 1969). Higher values of H indicate memory and persistence in the time series7. Even a small tendency for persistence can create patterns in random data that may appear as a phenomena of nature whereas in fact they are artifacts of dependence. A video demonstration of this effect is posted online (Munshi, Demonstration of persistence in a time series, 2016)8.
So a Hurst Exponent of 1/2 is the goal (think of it as a fair coin toss), while numbers larger have memory (and smaller have anti-memory…). So what does the temperature data have? The paper has several sites, several graphs, and roughly the same finding for each:
Figure 5 shows a good fit of the regression model with Adjusted R2=0.794. The graphical depiction of the fit in the lower left panel of Figure 5 shows some random extreme values particularly in the winter lows. The overall trend in temperature across the sample period is computed as 0.034C per year but since the corresponding p-value=0.091 > α=0.001, the data do not provide evidence of a long term trend. The residuals in Figure 6 show some evidence of patterns not captured by the model. Rescaled Range Analysis reveals a Hurst exponent of H=0.649, a value much greater than the calibrated value of H=0.52 for a synthetic Gaussian series. It indicates persistence in the data and a violation of the OLS assumption of independence. All data and computational details are available in an online data archive (Munshi, USCRNpaperArchive, 2016).
That is the finding for “Muleshoe”. I chose to use it because I like the name “Muleshoe” ;-)
In other words, the data are strongly dependent, and you can not find valid trend in them with a OLS fit. The whole foundation of every “trend” analysis I’ve seen by “climate scientists” is a least squares regression against averages of temperatures, usually based on monthly averages as found in the USHCN and GHCN. In short, their statistics methods are bogus, even if they DO show a trend. The ‘trend’ is most likely a result of data dependence, and the analysis method, not a real trend in temperatures.
Then, down about page 12-14, they do something fun. They make a random (Gaussian) set of data, bias it with a 10% dependence setting, and graph the result. It looks remarkably like normal temperature data from darned near anywhere. They then find a ‘trend’ in the data via the usual ‘climate science’ method, even though there is ZERO trend in the original data and only mild dependence added.
The shape of the persistent series in the bottom frame of Figure 15 is revealing. It shows that even in a case where there is no trend in the long term, decadal and multi-decadal trends can be found. Although the Hurst exponents observed in the USCRN temperature data are much lower (Figure 17), they indicate that we should not expect temperature data to be independent and Gaussian and that we should allow for the possibility of spurious decadal and multi-decadal patterns in the data.
So much for “settled science”…
Here is their “Summary and Conclusions” section:
Daily mean temperature data from five USCRN stations in the sample period 1/1/2005-4/3/2016 are examined for seasonality and dependence. It is found that greater accuracy and precision in the deseasonalization of the data can be achieved by using 8-day intervals instead of calendar months as the unit of time that defines the seasons. The residuals of the temperature data deseasonalized and detrended in this way are tested for dependence and persistence with Rescaled Range analysis. It is found that in all five cases, the Hurst exponent of the residuals is too high to support the usual assumption of independence in the interpretation of OLS trends over short time intervals of one or two decades. Persistence in the data can create spurious OLS trends. This phenomenon is demonstrated with a Monte Carlo simulation. The findings are consistent with previous studies of persistence in temperature data (Barnett, 1999) (Koutsoyiannis, 2003) (Munshi, The Hurst Exponent of Surface Temperature, 2015). All data and computational details used in this work are available in an online data archive (Munshi, USCRNpaperArchive, 2016).
In other words: “Ooops, they did it again”…
Other Useful Links
In figuring out this paper I found some other pages that were useful in sorting out some of the more complicated statistics bits. I’d heard of a “Hurst Exponent” before, but not ever used one, nor spent any brain-time thinking about it.
A peer-reviewed electronic journal. ISSN 1531-7714
Copyright 2002, PAREonline.net.
Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms.
Osborne, Jason & Elaine Waters (2002). Four assumptions of multiple regression that researchers should always test. Practical Assessment, Research & Evaluation, 8(2).
Since I can’t reproduce the whole thing readily (graphs, non-Roman character equations) and since I’m citing it under “fair use” anyway, “hit the link” for the full article. The “key bits” are that if you DON’T check for these things, your stats are crap. Near as I can tell, ‘climate science’ knows they don’t pass these tests so are deliberately choosing to make crap.
Four Assumptions Of Multiple Regression That Researchers Should Always Test
Jason W. Osborne and Elaine Waters
North Carolina State University and University of Oklahoma
Most statistical tests rely upon certain assumptions about the variables used in the analysis. When these assumptions are not met the results may not be trustworthy, resulting in a Type I or Type II error, or over- or under-estimation of significance or effect size(s). As Pedhazur (1997, p. 33) notes, “Knowledge and understanding of the situations when violations of assumptions lead to serious biases, and when they are of little consequence, are essential to meaningful data analysis”. However, as Osborne, Christensen, and Gunter (2001) observe, few articles report having tested assumptions of the statistical tests they rely on for drawing their conclusions. This creates a situation where we have a rich literature in education and social science, but we are forced to call into question the validity of many of these results, conclusions, and assertions, as we have no idea whether the assumptions of the statistical tests were met. Our goal for this paper is to present a discussion of the assumptions of multiple regression tailored toward the practicing researcher.
Several assumptions of multiple regression are “robust” to violation (e.g., normal distribution of errors), and others are fulfilled in the proper design of a study (e.g., independence of observations). Therefore, we will focus on the assumptions of multiple regression that are not robust to violation, and that researchers can deal with if violated. Specifically, we will discuss the assumptions of linearity, reliability of measurement, homoscedasticity, and normality.
Linearity is one often ignored. Non-linear systems, like weather and climate, need special handling… Reliability of measurement is, well, the main focus of Anthony Watts at WUWT (and countless others). Homoscedasticity sounds scary, and you can’t say it correctly three times fast ;-) but it is really just the simple idea that the standard deviation of the error terms are not dependent on the X variable. (Say, for example, your electronic thermometer tended to thermal runaway on very hot days, then your error term would be dependent on the measured temperature… or if it tended to cause error when running the humidity test via recycling it’s own air, as they did…)
See: https://en.wikipedia.org/wiki/Homoscedasticity if you want more…
Then “normality” just means the data fit a normal (“Bell”) curve. They are not skewed or kurtotic. https://en.wikipedia.org/wiki/Kurtosis But are temperature data normally distributed? I would assert that UHI alone makes that untrue. There is a skew to warm days being warmer in the record just via the UHI impact on sunny days.
VARIABLES ARE NORMALLY DISTRIBUTED.
Regression assumes that variables have normal distributions. Non-normally distributed variables (highly skewed or kurtotic variables, or variables with substantial outliers) can distort relationships and significance tests. There are several pieces of information that are useful to the researcher in testing this assumption: visual inspection of data plots, skew, kurtosis, and P-P plots give researchers information about normality, and Kolmogorov-Smirnov tests provide inferential statistics on normality. Outliers can be identified either through visual inspection of histograms or frequency distributions, or by converting data to z-scores.
Bivariate/multivariate data cleaning can also be important (Tabachnick & Fidell, p 139) in multiple regression. Most regression or multivariate statistics texts (e.g., Pedhazur, 1997; Tabachnick & Fidell, 2000) discuss the examination of standardized or studentized residuals, or indices of leverage. Analyses by Osborne (2001) show that removal of univariate and bivariate outliers can reduce the probability of Type I and Type II errors, and improve accuracy of estimates.
Outlier (univariate or bivariate) removal is straightforward in most statistical software. However, it is not always desirable to remove outliers. In this case transformations (e.g., square root, log, or inverse), can improve normality, but complicate the interpretation of the results, and should be used deliberately and in an informed manner. A full treatment of transformations is beyond the scope of this article, but is discussed in many popular statistical textbooks.
ASSUMPTION OF A LINEAR RELATIONSHIP BETWEEN THE INDEPENDENT AND DEPENDENT VARIABLE(S).
Standard multiple regression can only accurately estimate the relationship between dependent and independent variables if the relationships are linear in nature. As there are many instances in the social sciences where non-linear relationships occur (e.g., anxiety), it is essential to examine analyses for non-linearity. If the relationship between independent variables (IV) and the dependent variable (DV) is not linear, the results of the regression analysis will under-estimate the true relationship. This under-estimation carries two risks: increased chance of a Type II error for that IV, and in the case of multiple regression, an increased risk of Type I errors (over-estimation) for other IVs that share variance with that IV.
Authors such as Pedhazur (1997), Cohen and Cohen (1983), and Berry and Feldman (1985) suggest three primary ways to detect non-linearity. The first method is the use of theory or previous research to inform current analyses. However, as many prior researchers have probably overlooked the possibility of non-linear relationships, this method is not foolproof. A preferable method of detection is examination of residual plots (plots of the standardized residuals as a function of standardized predicted values, readily available in most statistical software). Figure 1 shows scatterplots of residuals that indicate curvilinear and linear relationships.
At this piont the original has a bunch of nice graphs. One shows a very nice “curvilinear relationship” strongly reminiscent of climate cycles… We’ll see if I can suck them in here:
Which of those two look more like the typical Climate Data to you? Since we already know there is at least a 60 year cycle, and a 6, 9 and 18 year lunar tidal cycles, I think the assumption of “linearity” is a fail…
VARIABLES ARE MEASURED WITHOUT ERROR (RELIABLY)
The nature of our educational and social science research means that many variables we are interested in are also difficult to measure, making measurement error a particular concern. In simple correlation and regression, unreliable measurement causes relationships to be under-estimated increasing the risk of Type II errors. In the case of multiple regression or partial correlation, effect sizes of other variables can be over-estimated if the covariate is not reliably measured, as the full effect of the covariate(s) would not be removed. This is a significant concern if the goal of research is to accurately model the “real” relationships evident in the population. Although most authors assume that reliability estimates (Cronbach alphas) of .7-.8 are acceptable (e.g., Nunnally, 1978) and Osborne, Christensen, and Gunter (2001) reported that the average alpha reported in top Educational Psychology journals was .83, measurement of this quality still contains enough measurement error to make correction worthwhile, as illustrated below.
Correction for low reliability is simple, and widely disseminated in most texts on regression, but rarely seen in the literature. We argue that authors should correct for low reliability to obtain a more accurate picture of the “true” relationship in the population, and, in the case of multiple regression or partial correlation, to avoid over-estimating the effect of another variable.
So if we have, for example, “covariate” changes from, say, lunar tidal or solar UV changes, and we do not allow for that, and attribute all the change to CO2, we are making error. If we even just have covariate UHI at the airports (that now dominate GHCN) over the decades as they changed from grass fields to hard black tarmac and concrete for miles; we are making error.
Reliability and simple regression
Since “the presence of measurement errors in behavioral research is the rule rather than the exception” and “reliabilities of many measures used in the behavioral sciences are, at best, moderate” (Pedhazur, 1997, p. 172); it is important that researchers be aware of accepted methods of dealing with this issue. For simple regression, Equation #1 provides an estimate of the “true” relationship between the IV and DV in the population:
It then has some nice formulas and graphs that I’m not going to copy here. Hit the link…
As Table 1 illustrates, even in cases where reliability is .80, correction for attenuation substantially changes the effect size (increasing variance accounted for by about 50%). When reliability drops to .70 or below this correction yields a substantially different picture of the “true” nature of the relationship, and potentially avoids a Type II error.
ASSUMPTION OF HOMOSCEDASTICITY
Homoscedasticity means that the variance of errors is the same across all levels of the IV. When the variance of errors differs at different values of the IV, heteroscedasticity is indicated. According to Berry and Feldman (1985) and Tabachnick and Fidell (1996) slight heteroscedasticity has little effect on significance tests; however, when heteroscedasticity is marked it can lead to serious distortion of findings and seriously weaken the analysis thus increasing the possibility of a Type I error.
This assumption can be checked by visual examination of a plot of the standardized residuals (the errors) by the regression standardized predicted value. Most modern statistical packages include this as an option. Figure 3 show examples of plots that might result from homoscedastic and heteroscedastic data.
Figure 3. Examples of homoscedasticity and heteroscedasticity
They have three images. one with a constant width scatter left to right that is homoscedastic, one with a ‘skinny middle’ and one with a ‘skinny end’ illustrating heteroscedasticity. I’m just going to copy one here:
Then look at any scatter plot of temperature data over time and watch the “range” reduce dramatically, left to right, just like that graph. I think it is patently obvious that the data from the 1700s and early 1800s are not “homoscedastic” with the data from the CRN today.
Since they want the whole thing reproduced, I’m going to paste the rest here including contact info and references, despite my having little interest in them. The first two blocks are of interest.
Ideally, residuals are randomly scattered around 0 (the horizontal line) providing a relatively even distribution. Heteroscedasticity is indicated when the residuals are not evenly scattered around the line. There are many forms heteroscedasticity can take, such as a bow-tie or fan shape. When the plot of residuals appears to deviate substantially from normal, more formal tests for heteroscedasticity should be performed. Possible tests for this are the Goldfeld-Quandt test when the error term either decreases or increases consistently as the value of the DV increases as shown in the fan shaped plot or the Glejser tests for heteroscedasticity when the error term has small variances at central observations and larger variance at the extremes of the observations as in the bowtie shaped plot (Berry & Feldman, 1985). In cases where skew is present in the IVs, transformation of variables can reduce the heteroscedasticity.
The goal of this article was to raise awareness of the importance of checking assumptions in simple and multiple regression. We focused on four assumptions that were not highly robust to violations, or easily dealt with through design of the study, that researchers could easily check and deal with, and that, in our opinion, appear to carry substantial benefits.
We believe that checking these assumptions carry significant benefits for the researcher. Making sure an analysis meets the associated assumptions helps avoid Type I and II errors. Attending to issues such as attenuation due to low reliability, curvilinearity, and non-normality often boosts effect sizes, usually a desirable outcome.
Finally, there are many non-parametric statistical techniques available to researchers when the assumptions of a parametric statistical technique is not met. Although these often are somewhat lower in power than parametric techniques, they provide valuable alternatives, and researchers should be familiar with them.
Berry, W. D., & Feldman, S. (1985). Multiple Regression in Practice. Sage University Paper Series on Quantitative Applications in the Social Sciences, series no. 07-050). Newbury Park, CA: Sage.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Nunnally, J. C. (1978). Psychometric Theory (2nd ed.). New York: McGraw Hill.
Osborne, J. W., Christensen, W. R., & Gunter, J. (April, 2001). Educational Psychology from a Statistician’s Perspective: A Review of the Power and Goodness of Educational Psychology Research. Paper presented at the national meeting of the American Education Research Association (AERA), Seattle, WA.
Osborne, J. W. (2001). A new look at outliers and fringeliers: Their effects on statistic accuracy and Type I and Type II error rates. Unpublished manuscript, Department of Educational Research and Leadership and Counselor Education, North Carolina State University.
Pedhazur, E. J., (1997). Multiple Regression in Behavioral Research (3rd ed.). Orlando, FL:Harcourt Brace.
Tabachnick, B. G., Fidell, L. S. (1996). Using Multivariate Statistics (3rd ed.). New York: Harper Collins College Publishers
Tabachnick, B. G., Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.). Needham Heights, MA: Allyn and Bacon
Jason W. Osborne, Ph.D
ERLCE, Campus Box 7801
Poe Hall 608,
North Carolina State University
Raleigh NC 27695-7801
Note that even the “social sciences” have clue about the importance of this. “Climate science” not so much…
In my opinion, the statics methods used to find a “trend” in the data, and the nature of the data used, in “climate science” invalidates their findings of a trend. The data violate at least 4 fundamental tests for Least Squares fit trend lines, and the methods used (monthly and daily averaging) make for spurious artifacts.
Known cycles, and known variations in data quality, render their results (in bogus 1/100 C precision) at best a gross error, at worst a hideous farce. Moral of the story? If you don’t have an expert in statistical analysis on your temperature analysis team, you are screwing up.
But that’s just my opinion…
Ain’t Statistics (or as we used to call it in class, Sadistics) fun?