The Program: fromexcel.f90
There was a request that some of the actual code be put up where folks could see it. OK, a quick “wc */*/*” kicked out about 80,000 lines of code… Not something I’m going to get pasted into WordPress real quick… So I picked one of the shortest programs I could find in the “linux” directory. The program that turns Excel pages into FORTRAN output. This looks to be used to turn Chinese temperature spreadsheets into data files, but could be used on any temperature spreadsheet, so “who knows”.
My general assessment of it is that Tim Mitchell has generally good “style”. It is readable and he even “pretty prints” with things lined up for both looks and readability. An indication of someone who “keeps a tidy mind” and a very good habit. The use of “implicit none” to prevent the default (“implicit”) declaration and typing of variables indicates a cautious approach to FORTRAN defaults and a desire to have ‘accidental declaration of variables’ flagged for proper handling.
It starts with an identification block: Who wrote it. What it does. Date written. And includes the assumptions the program makes so you know what to feed it. All very good stuff. It is clearly written in the “f90” dialect (comments start with a “!” instead of the depricated “C”. Lower case. Use of “::” and other f90 constructs.
The program is clearly structured, with a nice understandable series of subroutine calls. Each subroutine is clearly demarked with a comment bar. Each subroutine is short and clear. There is a notable absence of comments in the subroutines describing the reason they are set out as a subroutine, the function of the subroutine, and how the method used was chosen. Being fairly simple routines, this is somewhat understandable, but a note or two would be “nice to have”.
There is fairly little stated as to why one would choose one input value over another; in the ‘select’ routine, for example. The various “use [filename}” directives are left opaque.
The inclusion of the proper compiler directive as a comment showing the other source files needed to build the runnable program is also a very nice touch.
Bounds and validity checking is minimal, but probably OK for a personally run “hand tool”. There is a functional, but slightly hokey, call to the Unix environment to run “word count – wc” on the file to find out the lines to process before quitting. Yeah, it works, but… is there some reason not to just read the file one line at a time and when you get the last one, do your ‘EOF – End Of File’ branch handling? Then there is the “year check” that tells you to edit the source code if it finds fault. Somewhat tacky.
But finally, there is the question of: Why FORTRAN? Is it really the better choice for this than AWK, Perl, Python or some other string oriented language? I like FORTRAN and do think the folks who toss rocks at it often do not understand the convenience it offers for many things, especially math oriented and I/O functions. But is this really the best tool? But at the end of the day, if it works with a couple of hundred lines, a lot of it white space and comments, maybe it is an OK tool for the job. Especially if the rest of the system is in FORTRAN and you want to keep the number of languages used to a minimum (a virtue all too often ignored, IMHO!).
So at the end of the day, I’d do maintenance on Tim’s code any day. Yeah, it could use a bit more “why did I do this” and “where does that input come from” and “where does the output go?” comments (or an associated fromexcel.readme) for the context and runtime environment. But frankly, compared to the kind of stuff I’ve been swallowing in GIStemp, this is pure honey in comparison.
Oh, and in the “subroutine load” we have a couple of lines that run off the right edge. Real Programmers ™ can use the “view source” option of their browser and see what’s off the edge. The rest of use can enjoy the pleasant formatting. Unfortunately, WordPress themes to not all have “preformatted” translated to a box with a scroll bar to “see it all” and I’m not feeling like experimenting with new themes right now… Just ralize that in f90 you can go past position 72 while in f77 you can not. Tim is clearly embracing the features of f90 and cares not for f77 compatibility.
UPDATE; Worth a read: http://wattsupwiththat.com/2009/11/25/climategate-hide-the-decline-codified/
Source Code Listing
chiefio$ cat fromexcel.f90
! fromexcel.f90 ! program to convert raw data files from Excel spreadsheet to CRU ts format ! written by Dr. Tim Mitchell (Tyndall Centre) on 03.06.03 ! assumes one row of information per variable/station/year combination ! assumes that all relevant info is on each line (including meta data) ! f90 -error_limit 5 -o ./../cruts/fromexcel time.f90 filenames.f90 ! crutsfiles.f90 grimfiles.f90 ./../cruts/fromexcel.f90 program FromExcel use Time use FileNames use CRUtsFiles use GrimFiles implicit none real, pointer, dimension(:) :: Lat,Lon,Elv real, dimension(12) :: fRaw integer, pointer, dimension(:,:,:) :: Proc integer, pointer, dimension(:,:) :: Info integer, pointer, dimension(:) :: YearAD,Code,Beg,End character (len=20),pointer,dimension(:) :: Name character (len=13),pointer,dimension(:) :: Cty character (len= 9),pointer,dimension(:) :: Local integer, parameter :: DataMissVal = -9999 real, parameter :: MissVal = -999.0 real :: Factor,RawMissValA,RawMissValB,RawMulti integer :: AllocStat,ReadStatus, YearAD0,YearAD1 integer :: QSource integer :: iCode,iLat,iLon,iElv,iYear integer :: NYear,NMonth,NLine,NEntry integer :: XYear,XMonth,XLine,XEntry character (len=80) :: Variable,GivenFile,LoadFile,SaveFile,LineFormat character (len= 4) :: Suffix !******************************************************************************* call Intro call Select call Specify call Load call Save call Close contains !******************************************************************************* subroutine Intro open (99,file="/cru/mikeh1/f709762/scratch/log/log-fromexcel.dat", & status="replace",action="write") print* print*, " > ***** FromExcel: convert raw Excel to CRU ts *****" print* end subroutine Intro !******************************************************************************* subroutine Select print*, " > Select the file to load: " do read (*,*,iostat=ReadStatus), GivenFile if (ReadStatus.LE.0.AND.GivenFile.NE."") exit end do LoadFile = LoadPath (GivenFile," ") print*, " > Select source (0=help): " do read (*,*,iostat=ReadStatus), QSource if (ReadStatus.LE.0.AND.QSource.EQ.0) then print*, " > 1. Jian China" end if if (ReadStatus.LE.0.AND.QSource.GT.0) exit end do print*, " > Select variable (form: '.???'): " do read (*,*,iostat=ReadStatus), Suffix if (ReadStatus.LE.0) call CheckVariSuffix (Suffix,Variable,Factor) if (ReadStatus.LE.0.AND.Variable.NE."undefined") exit end do print*, " > Select the file to save (.cts): " do read (*,*,iostat=ReadStatus), GivenFile if (ReadStatus.LE.0.AND.GivenFile.NE."") exit end do SaveFile = SavePath (GivenFile,".cts") end subroutine Select !******************************************************************************* subroutine Specify if (QSource.EQ.1) then LineFormat = "(5i7,f6.1,11f7.1)" RawMissValA = -100.0 ; RawMissValB = 9999.0 ; RawMulti = 1.0 NEntry = 1000 ; YearAD0=1801 ; YearAD1=2020 end if NYear=YearAD1-YearAD0+1 ; NMonth=12 allocate (YearAD(NYear),Proc(NYear,NMonth,NEntry), & Code(NEntry),Name(NEntry),Cty(NEntry),Local(NEntry), & Lat(NEntry),Lon(NEntry),Elv(NEntry), & Beg(NEntry),End(NEntry), stat=AllocStat) if (AllocStat.NE.0) print*, " > ##### alloc #####" Lat=MissVal ; Lon=MissVal ; Elv=MissVal ; Beg=MissVal ; End=MissVal Proc=DataMissVal ; Code=MissVal Name="UNKNOWN" ; Cty="UNKNOWN" ; Local=" nocode" do XYear=1,NYear YearAD(XYear)=XYear-1+YearAD0 end do if (QSource.EQ.1) then Cty="CHINA" end if end subroutine Specify !******************************************************************************* subroutine Load print*, " > Loading..." call system ('wc -l ' // LoadFile // ' > trashme-fromexcel.txt') open (3,file='trashme-fromexcel.txt',status="old",access="sequential", & form="formatted",action="read") read (3,"(i10)"), NLine ! get number of lines close (3) call system ('rm trashme-fromexcel.txt') open (2,file=LoadFile,status="old",access="sequential",form="formatted",action="read") if (QSource.EQ.1) then do XLine = 1, Nline read (2,LineFormat), iCode,iLat,iLon,iElv,iYear,(fRaw(XMonth),XMonth=1,NMonth) XEntry=0 ; iCode=iCode*10 do XEntry=XEntry+1 if (Code(XEntry).EQ.MissVal) then Code(XEntry) = iCode Elv(XEntry) = real(iElv)/10.0 Lat(XEntry) = (mod(real(iLat),100.0)/60.0) + int(real(iLat)/100.0) Lon(XEntry) = (mod(real(iLon),100.0)/60.0) + int(real(iLon)/100.0) Beg(XEntry) = iYear write (99,"(i7,3f8.2)"), Code(XEntry),Lat(XEntry),Lon(XEntry),Elv(XEntry) end if if (Code(XEntry).EQ.iCode) then End(XEntry) = iYear XYear=iYear-YearAD0+1 if (XYear.LT.1.OR.XYear.GT.NYear) then print*, " > ***** Year range is bad. Reset in source code. " else do XMonth=1,NMonth if (fRaw(XMonth).NE.RawMissValA.AND.fRaw(XMonth).NE.RawMissValB) & Proc(XYear,XMonth,XEntry) = nint(fRaw(XMonth)*RawMulti/Factor) end do end if write (99,"(i4,12i5)"), iYear,(Proc(XYear,XMonth,XEntry),XMonth=1,NMonth) end if if (Code(XEntry).NE.iCode.AND.XEntry.EQ.NEntry) & print*, " > ***** NEntry is set too small. Reset in source code. " if (XEntry.EQ.NEntry) exit if (Code(XEntry).EQ.iCode) exit end do end do end if close (2) end subroutine Load !******************************************************************************* subroutine Save ! call MakeStnInfo (Info,Code,Code,Lat,Lon,Elv,1,Data=Proc,YearAD=YearAD) call MakeStnInfo (Info,Code,Code,Lat,Lon,Elv,1,YearAD0=Beg,YearAD1=End) call SaveCTS (Info,Local,Name,Cty,CallFile=SaveFile,YearAD=YearAD,Data=Proc) end subroutine Save !******************************************************************************* subroutine Close print* close (99) end subroutine Close !******************************************************************************* end program FromExcel
I’ve put up some more – relatively general – code/data comments.
http://www.di2.nu/200911/25.htm
I think you should probably put the REAL/INT/FLOAT squaring comment up as a separate post because it is in fact a critical (albeit minor) example of the code (lack of) quality and deserves being highlighted on its own.
fromexcel really is, as far as I can see, about a couple of dozen lines of perl turned into a couple of hundred lines of fortran with the key being a subroutine something like this
REPLY: [ You are right. It’s turning into one of those “things people need a lot of talking about to understand” topics…
I always loved the elegance of Perl… Oh, and folks, Francis has a library of the code up at:
http://di2.nu/foia/cru-code/
So if you see any particular program you want to comment on, feel free to put a comment here with a pointer to the particular program in question. Like so:
http://di2.nu/foia/cru-code/linux/cruts/fromexcel.f90
Could be done as a few dozen Perl lines! (I’m not as good at Perl as FrancisT !)
-ems ]
Here is the hockey stick code from a Briffa file:
It has been a long time since I did anything with Fortran. Off topic but, isn’t this thing screaming for a real relational database?
REPLY: [ Absolutely! Same as GIStemp. The bulk of the “processing” is just data loading and input preening (or ought to be preened, but isn’t…). Then some minor munging, and some printout. All that “loading / gluing” ought to be into a database. If I were writing this ‘product’, this is exactly what I would do. Then the “process” becomes simple report writing and your “whatifs” are minor report variations. -ems ]
Greg F – funny you should say that. I said the exact same thing. I have had some experience with the various capacity management tools out there – which can process literally terabytes of log files and extract trends not dissimilarly to how CRU TS produces them – and oddly enough they all use an oracle database
I noticed that Johnny at WUWT (EP
(I think I just hit a wrong key, sorry) I noticed this a.m. that Johnny at WUWT (2:08:44 — EPA CO2…11/25/09) wrote: “they hard-coded the hockey stick into the program which draws the curve! And with the fudge factor they can alter how much the curve bends like they want it to…” Then he goes to Willis Eschenbach (18:59:05) who identifies the program file FOI2009/FOI A/documents/harris-treebriffa_sep98_e.pro.
At the end of the discussion Johnny states: here it is in IDL not plain Fortran (debreuil (04:25:44)….and quotes the code(?) as “Apply a VERY ARTIFIC[I]AL correction for decline.”
Can you make sense of this in everyday English so we who understand very little about programming can get it? Is this a smoking gun smack dab in the middle of the code? Does it work everytime they run “the code”?
REPLY: [ I am “in the moment” of the holiday crush right now, but in a day or two I’ll see if I can ‘decode’ it… unless someone else does it first… maybe someone in one of those strange countries NOT having a holiday right now ;-) -ems ]
Judging by fromexcel.f90, it looks to me like the so-called “Excel” format is really just an ASCII text file. It probably doesn’t even have delimiters as I see no provision for the definition of a delimiter. This is not really an Excel format. It looks like it is a simple fixed-width text file with carriage return/line feeds or maybe just line feeds as record marks.
The record format is defined as (5i7,f6.1,11f7.1). Now I don’t know Fortran from Chinese, but I would interpret this as five consecutive seven-digit integers, a number (floating point decimal 6.1 as in 123456.0), and 11 numbers (floating point decimal 7.1 as in 1234567.0).
If the source file were a real Excel format, they would need to use some fairly sophisticated code for handling the raw file. They would have to reverse-engineer the file format, which is not a trivial task. Or they would have to use a system utility to render the columns and rows, but this would make them a prisoner of a particular vendor’s software.
One thing that I find odd is the use of the variables YearAD0 and YearAD1. These variables have been hardcoded to the values 1801 and 2020 respectively.
Here are the lines where these variables occur:
! definition of variable type
integer :: AllocStat,ReadStatus, YearAD0,YearAD1
! assignment of value
NEntry = 1000 ; YearAD0=1801 ; YearAD1=2020
! assignment of value; NYear = 2020 – 1801 + 1 = 220
NYear=YearAD1-YearAD0+1 ; NMonth=12
! assignment of values to an array YearAD
do XYear=1,NYear
YearAD(XYear)=XYear-1+YearAD0
end do
The first iteration would look like this:
do XYear=1,220
YearAD(1)=1-1+1801
end do
This would load array YearAD with the years ranging from 1801 to 2020.
It looks like this particular subroutine has a year 2020 problem. Whoever is responsible for maintaining the code will have to update this subroutine when the year hits 2021. This is not a good practice but maybe it’s an artifact of having to use Fortran.
I’m not sure why they even had to do this. Ideally, the programmer should just take the years that are in the data and process only those years. Maybe the Chinese have invented a time machine and can now measure temperatures in the future. You got me.
I could be mistaken of course as I am not a Fortran guru.
These statements in the Specify subroutine are redundant:
if (QSource.EQ.1) then
Cty=”CHINA”
end if
I would have put Cty=”CHINA” in the same conditional statement a few lines above it. It’s not a big deal, but it would shorten the program by three lines (including the extra blank line).
It looks like variables are global in Fortran, which is nice. It’s a shame that arrays have to be formally defined but oh well. It’s an old language.
Harold,
I think its a CSV format. Not a fixed width one. But maybe I’m misunderstanding the code since I don’t speak fortran
FrancisT, it sounds like Fortran assumes that commas are the delimiters. This behavior is probably built in to the system subroutines open or read. The open statement looks like it has some parameters that might control it (form=”formatted”). Maybe, the programmer can set the delimiter by specifying another “form” type.
Just out of curiosity, what languages do you use? I program mainly in derivatives of the language Pick BASIC, which plays nice with strings and arrays.
Just out of curiosity, what languages do you use? I program mainly in derivatives of the language Pick BASIC, which plays nice with strings and arrays.
Javascript and perl currently (and PHP / python sometimes). I’m just a web guy right now but once upon a time I wrote code in C, various assemblers and a bunch of other languages including VB. In fact since I have a BA in Computer Science from a prestigious university not terribly far from UEA I have even been forced to write things in BCPL and ML (no you don’t want to know). Fortunately I’ve forgotten them now.
An item of interest is that the above (IMHO pretty decent) code was produced by Tim Mitchell, while most of Harry’s initial problems (as discussed in the now famous lab notebook) have to do with trying to figure out what in blazes “Tim” did when coding HADCRUT2.
From Harry’s comments, I infer that these are the same person. I’m not certain because there are references to both “Tim M” and also “Tim O”, which has to be Tim Osborn, but I think most of Harry’s frustration is with Tim M’s code and docs.
So we have pretty clean code and some nasty stuff, probably from the same body. It’s hard to know what to conclude. Time pressures maybe?
Re the ARTIFICIAL and fudge verbiage that’s been found in the harris and osborn subdirectories, it would seem to touch nothing in CRU’s data products. Those who see conspiracy will have to look elsewhere.
This section of code will generate an error in Year 2021:
XYear=iYear-YearAD0+1
if (XYear.LT.1.OR.XYear.GT.NYear) then
print*, ” > ***** Year range is bad. Reset in source code. ”
else
Here is the code that processes the monthly data:
Proc(XYear,XMonth,XEntry) = nint(fRaw(XMonth)*RawMulti/Factor)
RawMulti is 1.0 as stated in the program. We don’t know the value of Factor as it is determined by a call to the subroutine CheckVariSuffix:
if (ReadStatus.LE.0) call CheckVariSuffix (Suffix,Variable,Factor)
A google search for CheckVariSuffix shows three hits, including this blog entry. I can’t find it in the Linux files folder listed as a separate entity.
It’s nice to see that Fortran has 3-D arrays.
Yo, Doctor Smith! Please email phone number.
REPLY: Done. -ems
Many thanks to FrancisT for loading the files in such a user friendly format.
Now, we know the frustrations of those trying to get raw data and source code out of Briffa, Mann and the rest of the team. So I went back to first principles and chose a raw data file at random, which is this one:
foia/briffa-treering-external/belfast/masters/eastmid.dat
The file is described as:
COMMENT – DMB 14-JUN-88 INPUT FROM LAXTON LITTON AND SIMPSON 13-FEB-87 AD882 TO AD1981
It’s basically tree ring widths. Pasted it into Excel and plotted a simple line chart. Result looked just like the temp charts we had in school before Mann’s hockey stick became the textbook chart, ie, up and down with a linear trend.
I’m no programmer, but if the code that turns Briffa’s raw data into a reconstruction is in there, you have replicabilty of a sort.
Good luck
Pingback: Climategate: Just Say Nopenhagen «
My guess is the answer is no, but has there been any sign of a design document for either the GISS or CRU code? Real software starts with functional requirements and a design document. The design doc gives the overall view of what the code is meant to achieve. A design doc would list the phases (e.g. raw->adjust->gridded->single global temp) and the algorithms to be used in each phase.
REPLY: [ The GIStemp code clearly shows that it “evolved” of “just growed”. STEP0 is even described as being a glue on after STEP1 and thus the name. You can see the evolution of the code as the different compiler specs come out and new language features are used. IMHO: It was never “designed”, it was an R&D “hand tool” that, once it produced the desired results, was “made production”. -ems ]
Real software does need functional requirements and a design document, but as one who does this for a living as a consultant I can tell you that these docs quickly go stale as the program evolves and the real design emerges. In this case, had they even included them, I wouldn’t trust them as far as I could chuck them.
What would be far more useful than stale functional specs would be a set of unit tests that exercise the code to prove out the functions’ respective intents, ie. push this data “X” in, get result “Y”.
In fact, unit tests would be quite useful in the analysis of this code as it would allow the construction of functional documentation as to what the code is doing and where potential errors exist. A quick poke-round the ‘net reveals, shockingly, that a FORTRAN unit test framework actually exists…
http://sourceforge.net/projects/fortranxunit/
I’m surprised Tim Mitchell didn’t use ‘make’ as his build tool.
REPLY: [ GIStemp uses inline calls to the compiler. Yes, it recompiles the code every time, then throws the binaries away. They treat it like interpreted BASIC… Seems like not using “make” is a consistent theme. No idea why… -ems ]
Build the Tower,
there are more than the current temperature product that makes up the AGW scam. Getting rid of the MWP and “proving” the current fast rise in temps are the 2 ends of the Paleo part.
Currently, covering the decline in some proxies modern period also, conveniently, reduces the MWP. The code that has been looked at is on point to this issue as are a number of the e-mails.
It IS a smoking gun in this respect.