Original image from the LIDAR working group at Hohenpeissenberg including a closer view of the observatory.
PApars.f logs, and a Slice of Pisa
In wandering through the PApars.f program in STEP2 (a key early program in the zones / anomaly / grid box process) I figured out what one of the log files was good for.
It’s good for answering the question “What ‘rural’ Station was used to adjust this Urban station history?”.
That file is named PApars.GHCN.CL.1000.20.log and it is typically found in the work_files directory at the end of a run. The first 5000 or so lines are not very interesting, mostly being just a list of stations and a count:
rural station: 1 id: 200460003 #ok 44 rural station: 2 id: 200690003 #ok 58 rural station: 3 id: 202740000 #ok 29
But it’s the later bit that is interesting. It includes the specifics as to which of these rural stations was used to “correct” what urban station.
I’ll be using it from time to time to look at particular stations and what they did / do / had done to them. In particular, on my ToDo list, is to answer the question from Ellie on the “Airports as UHI correction” thread about what stations were being modified by 2 airports on the edges of Iceland.
For now, though, we are going to look at what stations modified Pisa, Italy. Over on WattsUpWithThat, there was a posting that looked at Italy and various stations there, with “blink comparisons” of before and after GIStemp “correction”. For Pisa, the very early data drop by about 1.75C. Way beyond anything that is reasonable. The nagging question is “Why?”. It is very hard to find “where” and harder to get “why” out of it; but at this point I’m fairly certain the “where” is in PApars.f and the “why” is the choice of “Reference Stations” and the method by which they are applied.
We are mostly going to look at the issue of Which Stations in this posting today. I’ll come back to it in a few days with an update on the method issue (friends over for dinner in an hour, so I’ll have to complete this tonight / tomorrow.)
OK, so what do we have now?
Who “Corrects” Pisa? Would you believe Switzerland?
This is the correction log for Pisa Italy. I won’t go into what the PApars.f and related bits of code do at this time (back later), but I will describe what this log says. The first record says that the URBAN station is “161580004″ and that is Pisa. It uses a comparison radius of 500 km, so it is one of the better behaved ones (you get either 1000 km or half that if there are lots of reference stations). I’ve added the names for the reference stations, along with their country codes. You can see the years of that station and the overlap.
Now, one Really Interesting Thing. See that overlap of 129 years for Hohenpeissenb and 126 years for SAENTIS? How do you get 126 years of “overlap” from a Pisa record that starts in 1949? Yes, this matters to how the code chooses to apply stations for “corrections”. This is one Giant Dig Here! One hopes that it is just mis-printing the length of the SAENTIS record as “overlap” and not really treating it as an overlap, but the code does say that it looks at ‘overlap’ to decide who gets top billing (and top billing is also greatest impact. The first records averaged in to a reference average sets the “tone”; all subsequent records get ‘adjusted’ such that they do not move this mean, then get averaged in: so they can change the “tilt” but not the “mean”. I think this is a source of error, but need to Dig Here to show it.).
This might also explain the inordinate attention paid to Hohenpeissenberg in the STEP0 process, complete with getting a special copy of the data that is ‘more complete’. Might this make it “overlap” with more stations and thus use a cold German site in more “adjustments”? And rank it in the more influential “first position” more often? Pure speculation, but worth a good Dig Here!
Notice also that stations in Switzerland, Germany, Italy, and France are used to “correct” Pisa. Gotta think there is a bit of weather variation between those places…
So we have a Sea Coast moderated record from Pisa that is being adjusted by records from inland Switzerland and the Aviano Military Air base. OK, that has to do something, but Lord Only Knows what… And since things like the jet stream and AMO can shift historic relations between the areas that far apart, having a 20 year overlap that you use to set a “baseline” then judging any delta from that to be UHI confounds UHI with normal climate oscillators. Another Dig Here.
I added an A to the country code for sites that are airports. Supposedly pristine rural locations…) Notice that Pisa is adjusted by a couple of airports, too. My favorite being the Aviano record. IIRC that is a major military Air Base for Italy. Nice “rural” station /sarcoff>. This picture gives a nice idea what goes on there and I like the way the rain water is kicking up off the runway…
For comparison, you can look at the more or less “raw” Pisa GHCN graph here:
and compare the before and after adjustment (and much closer scale) here:
That is a figure from the WUWT posting about Italy.
Notice that the start of the graph gets about 1.75C nocked out of it… So somewhere in this adjustment process, the 60 years of Pisa history gets a 1.75C added tilt just from that process. 2C per century. Rather a lot, and not in the “raw” data.
Pisa from PApars.GHCN.CL.1000.20
*** urb stnID: 161580004 # rur: 31 ranges: 1949 2008 Radius: 500. LONGEST rur range: 1880-2008 129 109620002 617 HOHENPEISSENB add stn 2 range: 1883-2008 126 066800003 646 SAENTIS data added: 126 overlap: 126 years add stn 3 range: 1880-1985 106 067190010 646 ST. BERNARD SWITZERLA data added: 106 overlap: 106 years add stn 4 range: 1887-2008 102 111460002 603 SONNBLICK data added: 102 overlap: 102 years add stn 5 range: 1880-1960 81 067530010 603 ST. GOTTHARD SWITZERLA data added: 81 overlap: 81 years add stn 6 range: 1880-1980 69 144470000 609 HVAR data added: 69 overlap: 69 years add stn 7 range: 1880-1943 64 112340010 603 OBIR AUSTRIA data added: 64 overlap: 64 years add stn 8 range: 1951-2008 58 109610003 617 ZUGSPITZE data added: 58 overlap: 58 years add stn 9 range: 1933-1985 53 067300000 646 JUNGFRAUJOCH data added: 53 overlap: 53 years add stn 10 range: 1951-2007 32 162800000 623A PONZA data added: 32 overlap: 32 years add stn 11 range: 1951-1981 31 160660000 623A MILANO/MALPEN data added: 31 overlap: 31 years add stn 12 range: 1951-1980 30 161460000 623 PUNTA MARINA data added: 30 overlap: 30 years add stn 13 range: 1951-1980 30 160400000 623 TARVISIO data added: 30 overlap: 30 years add stn 14 range: 1951-1980 29 165060000 623 GUARDIAVECCHI data added: 29 overlap: 29 years add stn 15 range: 1954-2007 29 161790000 623 FRONTONE data added: 29 overlap: 29 years add stn 16 range: 1951-1980 29 161420010 623 RIFREDO MUGELLO data added: 29 overlap: 29 years add stn 17 range: 1951-1979 29 161290000 623 ISOLA DI PALM data added: 29 overlap: 29 years add stn 18 range: 1951-1979 29 161190010 623 PASSO DEI GIOVI data added: 29 overlap: 29 years add stn 19 range: 1955-2007 28 162240000 623 VIGNA DI VALL data added: 28 overlap: 28 years add stn 20 range: 1951-1978 27 161970020 623 ISOLA DI PIANOSA data added: 27 overlap: 27 years add stn 21 range: 1953-1980 27 161160010 623 GOVONE data added: 27 overlap: 27 years add stn 22 range: 1952-1977 24 165380010 625 MACOMER data added: 24 overlap: 24 years add stn 23 range: 1951-1974 24 165200020 623 ISOLA ASINARA data added: 24 overlap: 24 years add stn 24 range: 1954-1977 24 162800010 623A TORRE OLEVOLA AERO data added: 24 overlap: 24 years add stn 25 range: 1954-1976 23 162940010 623 CAPRI data added: 23 overlap: 23 years add stn 26 range: 1960-2007 23 162450000 623A PRATICA DI MA data added: 23 overlap: 23 years add stn 27 range: 1951-1975 23 161490010 623 SASSO FELTRIO data added: 23 overlap: 23 years add stn 28 range: 1951-1973 23 160360010 623A AVIANO data added: 23 overlap: 23 years add stn 29 range: 1951-1974 21 161970010 623 ISOLA GORGONA data added: 21 overlap: 21 years add stn 30 range: 1961-1980 20 162340000 623 GUIDONIA data added: 20 overlap: 20 years add stn 31 range: 1949-1968 20 075860010 615 MONT VENTOUX FRANCE data added: 20 overlap: 20 years
So there they are. These are the stations that are used to “modify” Pisa and the years during which they have data that overlap Pisa. By looking at each overlap period, and the average of the records that overlap in that period, we ought to be able to figure out which stations caused the Pisa data points to move. It will also likely take some “code time” looking at what the different parts of PApars.f do to the data in that averaging. That’s what I’ll be looking at about 1am to 4am tonight when everyone else is sleeping and I can “have a bit of a think” about it…
If you want to “get ahead of me” in the chase, feel free. You have the pointers above, the log file, and here is the source code…
UPDATE: 1 Sept 2009
This update has to do with a particular thing that PApars.f does that I think may be bogus and part of the cause for the large temperature shifts in the early years for some sites, like Pisa.
First, the “human readable” form, then the “computer language support”.
The program claims to “remove the station bias” by taking the interval of overlap of a new station, relative to the average of prior adjusted stations, and finding the average distance from that average, then moving the new data such that this “mean” will not change. Basically, you have a bunch of “rural” stations and you don’t want to shift that mean when you “add one”, you just want it’s “tilt” to have an impact. That is, if an old year is cold, it will pull an old year down, but some newer year will have to pull a newer year up to keep the mean stable.
Sounds sort of OK… but I do wonder about all these averaging of averages of offsetterized averages of anomalies… (the data we are dealing with here are “anomalies” not actual temperatures, which adds is own degree of indirection and confusion).
So I go looking at the code for a basic “sanity check” and want to pay particular attention to the “edge case” of the first record. Just HOW is it “de-biased” to be put into this “average rural anomaly reference” record? What I found is that, as near as I can tell, it isn’t.
That’s right, the “first station” IS the “Average” at the start of the series.
ALL subsequent “added station” data are ‘adjusted’ to ‘remove bias’ by referencing them to a ‘mean’ based initially on a single station that was not ‘unbiased’.
A disclaimer: I’m still working through this particular bit of code, and I’ve been ‘short slept’ a few nights; so I might have missed something here. That is why I’m putting this update here. I’d like to get “more eyes” on it (and some with less caffeine and more sleep ;-) Basically, I’d like someone else to “sanity check” this assertion and maybe even point out some bits I might have missed.
With that said: If this observation stands up to scrutiny, it means that every single “first ‘rural’ station” in every single ‘urban adjustment’ is a cherry picked bias. Not Good. The implications of this are why I’m rushing this point to “public” scrutiny rather than stewing on it for a few days (weeks?) as I sort it out.
Since Hohenpeissenberg is a very long record in Europe, it would bias a great number of other stations (the ‘first record’ slot goes automatically to the ‘longest lived’ record within the 1000 km radius (or 500 km if lots of stations qualify). And that might explain the extraordinary effort put into getting a longer Hohenpeissenburg record. Nothing like a little Cold German edge of Swiss Alps cold to make the past colder, yet be “averaged out” in newer records as a bolus of new thermometers move the recent data a bit higher… but the “mean” stays the same.
OK, The Code:
C**** Start with the station with the longest time record IS=ISOFI(1) IOFF=MFSR(IS)-I1SR(IS) nuseid(is)=nuseid(is)+1 C**** First update of full data and weight arrays DO 160 M=I1SR(IS),ILSR(IS) c if(M+IOFF.gt.IYRM0) stop 244 AVG(M+IOFF)=RDATA(M) IF(RDATA(M).LT.XBAD) WT(M+IOFF)=WTI(1) IF(RDATA(M).LT.XBAD) IWT(M+IOFF)=1 160 CONTINUE
The line right after “DO 160″ is a commented out “if” statement that does nothing now that it’s been turned into a comment with the leading “c”. We set a couple of variables and increment the “number used” counter then start a “DO” loop from first to last year of the adjustment average.
Notice that AVG(M+IOFF) = RDATA(M)
We just stuff the present record anomaly data into the AVG bucket. No adjustment of any kind. Rather than keeping the two arrays in sync, one is offset via “IOFF”. The WT , WTI, IWT and other hard to read variants are various buckets for holding weighting flags. The weighting is adjusted by radius. This posting is not looking at weighting here, though, we’re talking about BIAS.
And it sure looks to me like the first station is just accepted as RIGHT without addressing the issue of “is Hohenpeissenberg anomaly colder than or biased relative to Pisa?”
Then we go off to “add in the new stations”:
C**** Add in the remaining stations DO 190 I=2,IS0 IS=ISOFI(I) IOFF=MFSR(IS)-I1SR(IS) write(*,'(a,i5,a,i5,a,i4,i6,i10)')'add stn',i,' range:', * MFSR(IS)+IYOFF,'-',ILSR(IS)+ioff+IYOFF,LENis(i),idr(is)
We DO the 2nd to the limit of ‘nearby rural’ stations (IS0) writing out their log entry on the way.
C**** Extend the new data into a full series DO 170 M=1,IYRM 170 DNEW(M)=XBAD
Pre-stuff the NEW Data array with bad flags, to be overwritten only where we have a good data item.
DO 180 M=I1SR(IS),ILSR(IS) c if(M+IOFF.gt.IYRM0) stop 259 180 DNEW(M+IOFF)=RDATA(M)
I prefer to use a NNN Continue to end a loop, but if you end on a statement, it does execute that statement, so the “commented out” if statement inside the loop does nothing, but the target at 180 does. For those records inside the limits of the average range, it stuffs the RDATA item for that station into the DNEW array for that year position).
NF1=MFSR(IS) NL1=ILSR(IS)+IOFF C**** Shift new data, then combine them with current mean CALL CMBINE (AVG(1),WT,IWT, DNEW,NF1,NL1,WTI(I), * IDR(IS),NSM,NCOM) write(*,*) 'data added: ',nsm,' overlap:',ncom,' years' IF(NSM.EQ.0) GO TO 190 nuseid(is)=nuseid(is)+1 190 CONTINUE
For some reason, the folks who wrote this code love to change the names of things mid stream. So the variables NF1 AND NL1 (that I think means First and Last) are stuffed from other variables, then handed to the CALL CMBINE (along with the New Data array and weighting factors). When we return, we print out a record that says we added that station to the average and increment the number of used IDs by one nuseid(is)=nuseids(is)+1 (And here you can see the mixing of f77 ALL CAPS code with newer F90 lower case… clearly an ‘update’ happened at some point).
So what does CMBINE do? It is a subroutine that combines the two records by “removing the bias” from the new data relative to the existing average. But on the first call, the average is only the initial station, unadjusted for “bias”, all future additions must conform to the “mean” it sets. What happens if it is biased?
SUBROUTINE CMBINE (AVG,WT,IWT, DNEW,NF1,NL1,WT1,ID,NSM, NCOM) C**** C**** Bias of new data is removed by subtracting the difference C**** over the common domain. Then the new data are averaged in.
and a bit further down:
DO 10 N=NF1,NL1 IF(AVG(N).GE.XBAD.OR.DNEW(N).GE.XBAD) GO TO 10 NCOM=NCOM+1 SUM=SUM+AVG(N) SUMN=SUMN+DNEW(N) 10 CONTINUE
If either the “average” or the “new” data are a bad flag, just skip it and jump to 10, the end of the DO loop. Otherwise, increment the number of items counter NCOM by one and add the Average for each year to a sum of the averages (first pass through, this will be ONLY the first rural station record, unadjusted!) and add the New data to a new data sum.
IF(NCOM.LT.NLAP) RETURN BIAS=(SUM-SUMN)/FLOAT(NCOM)
If we don’t have enough overlap (count greater than 20) give up and return. Otherwise, compute the BIAS that is simply the difference of the two summed series divided by the count. The new data are referenced to the AVERAGE to measure bias, but the very first record IS the “benchmark” on the first pass. It is not adjusted for “bias”, it sets the bias benchmark. So the 2nd record will be adjusted to conform to the mean of that bias, then that average becomes the standard for the next record (with both the 1st record, and the 2nd one adjusted to the first one) reflecting any “bias” present in the first record).
C**** Update period of valid data, averages and weights DO 20 N=NF1,NL1 IF(DNEW(N).GE.XBAD) GO TO 20 WTNEW=WT(N)+WT1 AVG(N)=(WT(N)*AVG(N)+WT1*(DNEW(N)+BIAS))/WTNEW WT(N)=WTNEW IWT(N)=IWT(N)+1 NSM=NSM+1 20 CONTINUE
Then, for each of the years, IFF the NEW data is not a missing / bad data flag, we adjust the weighting factors (that are based on the distance of the station from the Urban station), and compute a “new” AVG with the DNEW record ‘adjusted to remove bias’ but the AVG left alone (it, presumably, already being free of bias from prior passes through here; but in fact carrying forward an accumulating bias). Except in the case of the first record. THAT record was accepted As Is.
We then increment some counters and return the result
OK, this is rather crufty and someone complicated code, written in a painful style, with lots of I i 1 L l confusions possible and lots of 0Oo confusions possible and COMMON blocks that might be changing things somewhere else a bit unseen and….
So I’m asking for some help. I’m asking for a ‘sanity check’ on what I think is going on and what I suspect I’ve found. I’m not ready to shout Eureka! until I’ve been “desk checked” by someone else to make sure this isn’t just too much coffee and not enough sleep talking…
But sure looks like a “Eureka!” moment is right around the corner to me…