CRUt – fromexcel.f90 program listing

The Program: fromexcel.f90

There was a request that some of the actual code be put up where folks could see it. OK, a quick “wc */*/*” kicked out about 80,000 lines of code… Not something I’m going to get pasted into WordPress real quick… So I picked one of the shortest programs I could find in the “linux” directory. The program that turns Excel pages into FORTRAN output. This looks to be used to turn Chinese temperature spreadsheets into data files, but could be used on any temperature spreadsheet, so “who knows”.

My general assessment of it is that Tim Mitchell has generally good “style”. It is readable and he even “pretty prints” with things lined up for both looks and readability. An indication of someone who “keeps a tidy mind” and a very good habit. The use of “implicit none” to prevent the default (“implicit”) declaration and typing of variables indicates a cautious approach to FORTRAN defaults and a desire to have ‘accidental declaration of variables’ flagged for proper handling.

It starts with an identification block: Who wrote it. What it does. Date written. And includes the assumptions the program makes so you know what to feed it. All very good stuff. It is clearly written in the “f90” dialect (comments start with a “!” instead of the depricated “C”. Lower case. Use of “::” and other f90 constructs.

The program is clearly structured, with a nice understandable series of subroutine calls. Each subroutine is clearly demarked with a comment bar. Each subroutine is short and clear. There is a notable absence of comments in the subroutines describing the reason they are set out as a subroutine, the function of the subroutine, and how the method used was chosen. Being fairly simple routines, this is somewhat understandable, but a note or two would be “nice to have”.

There is fairly little stated as to why one would choose one input value over another; in the ‘select’ routine, for example. The various “use [filename}” directives are left opaque.

The inclusion of the proper compiler directive as a comment showing the other source files needed to build the runnable program is also a very nice touch.

Bounds and validity checking is minimal, but probably OK for a personally run “hand tool”. There is a functional, but slightly hokey, call to the Unix environment to run “word count – wc” on the file to find out the lines to process before quitting. Yeah, it works, but… is there some reason not to just read the file one line at a time and when you get the last one, do your ‘EOF – End Of File’ branch handling? Then there is the “year check” that tells you to edit the source code if it finds fault. Somewhat tacky.

But finally, there is the question of: Why FORTRAN? Is it really the better choice for this than AWK, Perl, Python or some other string oriented language? I like FORTRAN and do think the folks who toss rocks at it often do not understand the convenience it offers for many things, especially math oriented and I/O functions. But is this really the best tool? But at the end of the day, if it works with a couple of hundred lines, a lot of it white space and comments, maybe it is an OK tool for the job. Especially if the rest of the system is in FORTRAN and you want to keep the number of languages used to a minimum (a virtue all too often ignored, IMHO!).

So at the end of the day, I’d do maintenance on Tim’s code any day. Yeah, it could use a bit more “why did I do this” and “where does that input come from” and “where does the output go?” comments (or an associated fromexcel.readme) for the context and runtime environment. But frankly, compared to the kind of stuff I’ve been swallowing in GIStemp, this is pure honey in comparison.

Oh, and in the “subroutine load” we have a couple of lines that run off the right edge. Real Programmers ™ can use the “view source” option of their browser and see what’s off the edge. The rest of use can enjoy the pleasant formatting. Unfortunately, WordPress themes to not all have “preformatted” translated to a box with a scroll bar to “see it all” and I’m not feeling like experimenting with new themes right now… Just ralize that in f90 you can go past position 72 while in f77 you can not. Tim is clearly embracing the features of f90 and cares not for f77 compatibility.

UPDATE; Worth a read: http://wattsupwiththat.com/2009/11/25/climategate-hide-the-decline-codified/

Source Code Listing

chiefio$ cat fromexcel.f90

! fromexcel.f90
! program to convert raw data files from Excel spreadsheet to CRU ts format
! written by Dr. Tim Mitchell (Tyndall Centre) on 03.06.03
! assumes one row of information per variable/station/year combination
! assumes that all relevant info is on each line (including meta data)
! f90 -error_limit 5 -o ./../cruts/fromexcel time.f90 filenames.f90 
!       crutsfiles.f90 grimfiles.f90 ./../cruts/fromexcel.f90

program FromExcel

use Time
use FileNames
use CRUtsFiles
use GrimFiles

implicit none

real, pointer, dimension(:)             :: Lat,Lon,Elv
real, dimension(12)                     :: fRaw

integer, pointer, dimension(:,:,:)      :: Proc
integer, pointer, dimension(:,:)        :: Info
integer, pointer, dimension(:)          :: YearAD,Code,Beg,End

character (len=20),pointer,dimension(:) :: Name
character (len=13),pointer,dimension(:) :: Cty
character (len= 9),pointer,dimension(:) :: Local

integer, parameter      :: DataMissVal = -9999
real, parameter         :: MissVal = -999.0

real :: Factor,RawMissValA,RawMissValB,RawMulti

integer :: AllocStat,ReadStatus, YearAD0,YearAD1
integer :: QSource
integer :: iCode,iLat,iLon,iElv,iYear
integer :: NYear,NMonth,NLine,NEntry
integer :: XYear,XMonth,XLine,XEntry

character (len=80) :: Variable,GivenFile,LoadFile,SaveFile,LineFormat
character (len= 4) :: Suffix

!*******************************************************************************

call Intro
call Select
call Specify
call Load
call Save
call Close

contains

!*******************************************************************************

subroutine Intro

open (99,file="/cru/mikeh1/f709762/scratch/log/log-fromexcel.dat", &
                status="replace",action="write")
print*
print*, "  > ***** FromExcel: convert raw Excel to CRU ts *****"
print*

end subroutine Intro

!*******************************************************************************

subroutine Select

print*, "  > Select the file to load: "
do
        read (*,*,iostat=ReadStatus), GivenFile
        if (ReadStatus.LE.0.AND.GivenFile.NE."") exit
end do
LoadFile = LoadPath (GivenFile,"    ")

print*, "  > Select source (0=help): "
do
  read (*,*,iostat=ReadStatus), QSource
  if (ReadStatus.LE.0.AND.QSource.EQ.0) then
    print*, "  >  1. Jian China"
  end if
  if (ReadStatus.LE.0.AND.QSource.GT.0) exit
end do

print*, "  > Select variable (form: '.???'): "
do
  read (*,*,iostat=ReadStatus), Suffix
  if (ReadStatus.LE.0) call CheckVariSuffix (Suffix,Variable,Factor)
  if (ReadStatus.LE.0.AND.Variable.NE."undefined") exit
end do

print*, "  > Select the file to save (.cts): "
do
        read (*,*,iostat=ReadStatus), GivenFile
        if (ReadStatus.LE.0.AND.GivenFile.NE."") exit
end do
SaveFile = SavePath (GivenFile,".cts")

end subroutine Select

!*******************************************************************************

subroutine Specify

if (QSource.EQ.1) then
  LineFormat = "(5i7,f6.1,11f7.1)"
  RawMissValA = -100.0 ; RawMissValB = 9999.0 ; RawMulti = 1.0
  NEntry = 1000 ; YearAD0=1801 ; YearAD1=2020
end if

NYear=YearAD1-YearAD0+1 ; NMonth=12 

allocate (YearAD(NYear),Proc(NYear,NMonth,NEntry), &
          Code(NEntry),Name(NEntry),Cty(NEntry),Local(NEntry), &
          Lat(NEntry),Lon(NEntry),Elv(NEntry), &
          Beg(NEntry),End(NEntry), stat=AllocStat)
if (AllocStat.NE.0) print*, "  > ##### alloc #####"

Lat=MissVal ; Lon=MissVal ; Elv=MissVal ; Beg=MissVal ; End=MissVal
Proc=DataMissVal ; Code=MissVal 
Name="UNKNOWN" ; Cty="UNKNOWN" ; Local="   nocode"

do XYear=1,NYear
  YearAD(XYear)=XYear-1+YearAD0
end do

if (QSource.EQ.1) then
  Cty="CHINA"
end if

end subroutine Specify

!*******************************************************************************

subroutine Load

print*, "  > Loading..."

call system ('wc -l ' // LoadFile // ' > trashme-fromexcel.txt')
open  (3,file='trashme-fromexcel.txt',status="old",access="sequential", &
                form="formatted",action="read")
read  (3,"(i10)"), NLine                        ! get number of lines
close (3)
call system ('rm trashme-fromexcel.txt')

open  (2,file=LoadFile,status="old",access="sequential",form="formatted",action="read")    
if (QSource.EQ.1) then
  do XLine = 1, Nline
    read (2,LineFormat), iCode,iLat,iLon,iElv,iYear,(fRaw(XMonth),XMonth=1,NMonth)
    XEntry=0 ; iCode=iCode*10
    do
      XEntry=XEntry+1 
      
      if (Code(XEntry).EQ.MissVal) then
        Code(XEntry) = iCode 
        Elv(XEntry)  = real(iElv)/10.0
        Lat(XEntry)  = (mod(real(iLat),100.0)/60.0) + int(real(iLat)/100.0)
        Lon(XEntry)  = (mod(real(iLon),100.0)/60.0) + int(real(iLon)/100.0)
        Beg(XEntry)  = iYear
        write (99,"(i7,3f8.2)"), Code(XEntry),Lat(XEntry),Lon(XEntry),Elv(XEntry)
      end if
      
      if (Code(XEntry).EQ.iCode) then
        End(XEntry)  = iYear
        XYear=iYear-YearAD0+1
        if (XYear.LT.1.OR.XYear.GT.NYear) then
          print*, "  > ***** Year range is bad. Reset in source code. "
        else
          do XMonth=1,NMonth
            if (fRaw(XMonth).NE.RawMissValA.AND.fRaw(XMonth).NE.RawMissValB) &
              Proc(XYear,XMonth,XEntry) = nint(fRaw(XMonth)*RawMulti/Factor)
          end do
        end if
        write (99,"(i4,12i5)"), iYear,(Proc(XYear,XMonth,XEntry),XMonth=1,NMonth)
      end if
      
      if (Code(XEntry).NE.iCode.AND.XEntry.EQ.NEntry) &
                print*, "  > ***** NEntry is set too small. Reset in source code. "

      if (XEntry.EQ.NEntry) exit
      if (Code(XEntry).EQ.iCode) exit
    end do
  end do
end if
close (2)

end subroutine Load

!*******************************************************************************

subroutine Save

! call MakeStnInfo (Info,Code,Code,Lat,Lon,Elv,1,Data=Proc,YearAD=YearAD)
call MakeStnInfo (Info,Code,Code,Lat,Lon,Elv,1,YearAD0=Beg,YearAD1=End)
call SaveCTS (Info,Local,Name,Cty,CallFile=SaveFile,YearAD=YearAD,Data=Proc)

end subroutine Save

!*******************************************************************************

subroutine Close

print*
close (99)

end subroutine Close

!*******************************************************************************

end program FromExcel

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in CRUt and tagged , , , , , . Bookmark the permalink.

20 Responses to CRUt – fromexcel.f90 program listing

  1. FrancisT says:

    I’ve put up some more – relatively general – code/data comments.

    http://www.di2.nu/200911/25.htm

    I think you should probably put the REAL/INT/FLOAT squaring comment up as a separate post because it is in fact a critical (albeit minor) example of the code (lack of) quality and deserves being highlighted on its own.

    fromexcel really is, as far as I can see, about a couple of dozen lines of perl turned into a couple of hundred lines of fortran with the key being a subroutine something like this

    sub load {
    my ($infile, $outfile) = @_;
    open (INFILE, “$outfile”) or die “Error …”;
    while () {
    my @data = split(/,/, $_);
    my $processed_data = ”;
    #process the data line as required and append to $processed_data
    print OUTFILE $processed_data
    }

    REPLY: [ You are right. It’s turning into one of those “things people need a lot of talking about to understand” topics…

    I always loved the elegance of Perl… Oh, and folks, Francis has a library of the code up at:

    http://di2.nu/foia/cru-code/

    So if you see any particular program you want to comment on, feel free to put a comment here with a pointer to the particular program in question. Like so:

    http://di2.nu/foia/cru-code/linux/cruts/fromexcel.f90

    Could be done as a few dozen Perl lines! (I’m not as good at Perl as FrancisT !)

    -ems ]

  2. NikFromNYC says:

    Here is the hockey stick code from a Briffa file:

  3. Greg F says:

    It has been a long time since I did anything with Fortran. Off topic but, isn’t this thing screaming for a real relational database?

    REPLY: [ Absolutely! Same as GIStemp. The bulk of the “processing” is just data loading and input preening (or ought to be preened, but isn’t…). Then some minor munging, and some printout. All that “loading / gluing” ought to be into a database. If I were writing this ‘product’, this is exactly what I would do. Then the “process” becomes simple report writing and your “whatifs” are minor report variations. -ems ]

  4. FrancisT says:

    Greg F – funny you should say that. I said the exact same thing. I have had some experience with the various capacity management tools out there – which can process literally terabytes of log files and extract trends not dissimilarly to how CRU TS produces them – and oddly enough they all use an oracle database

  5. pyromancer76 says:

    I noticed that Johnny at WUWT (EP

    (I think I just hit a wrong key, sorry) I noticed this a.m. that Johnny at WUWT (2:08:44 — EPA CO2…11/25/09) wrote: “they hard-coded the hockey stick into the program which draws the curve! And with the fudge factor they can alter how much the curve bends like they want it to…” Then he goes to Willis Eschenbach (18:59:05) who identifies the program file FOI2009/FOI A/documents/harris-treebriffa_sep98_e.pro.

    At the end of the discussion Johnny states: here it is in IDL not plain Fortran (debreuil (04:25:44)….and quotes the code(?) as “Apply a VERY ARTIFIC[I]AL correction for decline.”

    Can you make sense of this in everyday English so we who understand very little about programming can get it? Is this a smoking gun smack dab in the middle of the code? Does it work everytime they run “the code”?

    REPLY: [ I am “in the moment” of the holiday crush right now, but in a day or two I’ll see if I can ‘decode’ it… unless someone else does it first… maybe someone in one of those strange countries NOT having a holiday right now ;-) -ems ]

  6. Harold Vance says:

    Judging by fromexcel.f90, it looks to me like the so-called “Excel” format is really just an ASCII text file. It probably doesn’t even have delimiters as I see no provision for the definition of a delimiter. This is not really an Excel format. It looks like it is a simple fixed-width text file with carriage return/line feeds or maybe just line feeds as record marks.

    The record format is defined as (5i7,f6.1,11f7.1). Now I don’t know Fortran from Chinese, but I would interpret this as five consecutive seven-digit integers, a number (floating point decimal 6.1 as in 123456.0), and 11 numbers (floating point decimal 7.1 as in 1234567.0).

    If the source file were a real Excel format, they would need to use some fairly sophisticated code for handling the raw file. They would have to reverse-engineer the file format, which is not a trivial task. Or they would have to use a system utility to render the columns and rows, but this would make them a prisoner of a particular vendor’s software.

  7. Harold Vance says:

    One thing that I find odd is the use of the variables YearAD0 and YearAD1. These variables have been hardcoded to the values 1801 and 2020 respectively.

    Here are the lines where these variables occur:

    ! definition of variable type
    integer :: AllocStat,ReadStatus, YearAD0,YearAD1

    ! assignment of value
    NEntry = 1000 ; YearAD0=1801 ; YearAD1=2020

    ! assignment of value; NYear = 2020 – 1801 + 1 = 220
    NYear=YearAD1-YearAD0+1 ; NMonth=12

    ! assignment of values to an array YearAD
    do XYear=1,NYear
    YearAD(XYear)=XYear-1+YearAD0
    end do

    The first iteration would look like this:

    do XYear=1,220
    YearAD(1)=1-1+1801
    end do

    This would load array YearAD with the years ranging from 1801 to 2020.

    It looks like this particular subroutine has a year 2020 problem. Whoever is responsible for maintaining the code will have to update this subroutine when the year hits 2021. This is not a good practice but maybe it’s an artifact of having to use Fortran.

    I’m not sure why they even had to do this. Ideally, the programmer should just take the years that are in the data and process only those years. Maybe the Chinese have invented a time machine and can now measure temperatures in the future. You got me.

    I could be mistaken of course as I am not a Fortran guru.

  8. Harold Vance says:

    These statements in the Specify subroutine are redundant:

    if (QSource.EQ.1) then
    Cty=”CHINA”
    end if

    I would have put Cty=”CHINA” in the same conditional statement a few lines above it. It’s not a big deal, but it would shorten the program by three lines (including the extra blank line).

    It looks like variables are global in Fortran, which is nice. It’s a shame that arrays have to be formally defined but oh well. It’s an old language.

  9. FrancisT says:

    Harold,

    I think its a CSV format. Not a fixed width one. But maybe I’m misunderstanding the code since I don’t speak fortran

  10. Harold Vance says:

    FrancisT, it sounds like Fortran assumes that commas are the delimiters. This behavior is probably built in to the system subroutines open or read. The open statement looks like it has some parameters that might control it (form=”formatted”). Maybe, the programmer can set the delimiter by specifying another “form” type.

    Just out of curiosity, what languages do you use? I program mainly in derivatives of the language Pick BASIC, which plays nice with strings and arrays.

  11. FrancisT says:

    Just out of curiosity, what languages do you use? I program mainly in derivatives of the language Pick BASIC, which plays nice with strings and arrays.

    Javascript and perl currently (and PHP / python sometimes). I’m just a web guy right now but once upon a time I wrote code in C, various assemblers and a bunch of other languages including VB. In fact since I have a BA in Computer Science from a prestigious university not terribly far from UEA I have even been forced to write things in BCPL and ML (no you don’t want to know). Fortunately I’ve forgotten them now.

  12. Build The Tower says:

    An item of interest is that the above (IMHO pretty decent) code was produced by Tim Mitchell, while most of Harry’s initial problems (as discussed in the now famous lab notebook) have to do with trying to figure out what in blazes “Tim” did when coding HADCRUT2.

    From Harry’s comments, I infer that these are the same person. I’m not certain because there are references to both “Tim M” and also “Tim O”, which has to be Tim Osborn, but I think most of Harry’s frustration is with Tim M’s code and docs.

    So we have pretty clean code and some nasty stuff, probably from the same body. It’s hard to know what to conclude. Time pressures maybe?

    Re the ARTIFICIAL and fudge verbiage that’s been found in the harris and osborn subdirectories, it would seem to touch nothing in CRU’s data products. Those who see conspiracy will have to look elsewhere.

  13. Harold Vance says:

    This section of code will generate an error in Year 2021:

    XYear=iYear-YearAD0+1

    if (XYear.LT.1.OR.XYear.GT.NYear) then
    print*, ” > ***** Year range is bad. Reset in source code. ”
    else

    Here is the code that processes the monthly data:

    Proc(XYear,XMonth,XEntry) = nint(fRaw(XMonth)*RawMulti/Factor)

    RawMulti is 1.0 as stated in the program. We don’t know the value of Factor as it is determined by a call to the subroutine CheckVariSuffix:

    if (ReadStatus.LE.0) call CheckVariSuffix (Suffix,Variable,Factor)

    A google search for CheckVariSuffix shows three hits, including this blog entry. I can’t find it in the Linux files folder listed as a separate entity.

    It’s nice to see that Fortran has 3-D arrays.

  14. charles the moderator says:

    Yo, Doctor Smith! Please email phone number.

    REPLY: Done. -ems

  15. Many thanks to FrancisT for loading the files in such a user friendly format.

    Now, we know the frustrations of those trying to get raw data and source code out of Briffa, Mann and the rest of the team. So I went back to first principles and chose a raw data file at random, which is this one:

    foia/briffa-treering-external/belfast/masters/eastmid.dat

    The file is described as:

    COMMENT – DMB 14-JUN-88 INPUT FROM LAXTON LITTON AND SIMPSON 13-FEB-87 AD882 TO AD1981

    It’s basically tree ring widths. Pasted it into Excel and plotted a simple line chart. Result looked just like the temp charts we had in school before Mann’s hockey stick became the textbook chart, ie, up and down with a linear trend.

    I’m no programmer, but if the code that turns Briffa’s raw data into a reconstruction is in there, you have replicabilty of a sort.

    Good luck

  16. Pingback: Climategate: Just Say Nopenhagen «

  17. climatebeagle says:

    My guess is the answer is no, but has there been any sign of a design document for either the GISS or CRU code? Real software starts with functional requirements and a design document. The design doc gives the overall view of what the code is meant to achieve. A design doc would list the phases (e.g. raw->adjust->gridded->single global temp) and the algorithms to be used in each phase.

    REPLY: [ The GIStemp code clearly shows that it “evolved” of “just growed”. STEP0 is even described as being a glue on after STEP1 and thus the name. You can see the evolution of the code as the different compiler specs come out and new language features are used. IMHO: It was never “designed”, it was an R&D “hand tool” that, once it produced the desired results, was “made production”. -ems ]

  18. Chris R. Chapman says:

    Real software does need functional requirements and a design document, but as one who does this for a living as a consultant I can tell you that these docs quickly go stale as the program evolves and the real design emerges. In this case, had they even included them, I wouldn’t trust them as far as I could chuck them.

    What would be far more useful than stale functional specs would be a set of unit tests that exercise the code to prove out the functions’ respective intents, ie. push this data “X” in, get result “Y”.

    In fact, unit tests would be quite useful in the analysis of this code as it would allow the construction of functional documentation as to what the code is doing and where potential errors exist. A quick poke-round the ‘net reveals, shockingly, that a FORTRAN unit test framework actually exists…

    http://sourceforge.net/projects/fortranxunit/

  19. Eric Jablow says:

    I’m surprised Tim Mitchell didn’t use ‘make’ as his build tool.

    REPLY: [ GIStemp uses inline calls to the compiler. Yes, it recompiles the code every time, then throws the binaries away. They treat it like interpreted BASIC… Seems like not using “make” is a consistent theme. No idea why… -ems ]

  20. kuhnkat says:

    Build the Tower,

    there are more than the current temperature product that makes up the AGW scam. Getting rid of the MWP and “proving” the current fast rise in temps are the 2 ends of the Paleo part.

    Currently, covering the decline in some proxies modern period also, conveniently, reduces the MWP. The code that has been looked at is on point to this issue as are a number of the e-mails.

    It IS a smoking gun in this respect.

Comments are closed.