I’ve done a first pass through the GCM (Global Circulation Model) Model E code.
If “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk” then Model E is structured so that the Elephant can play Mozart while on parade…
(History of the saying in a short posting here by Steven Goddard.)
Never The Same Code Twice
First off, it follows in the tradition of GISStemp where every time you run the program it is recompiled. This means it is never the same code twice.
Compilers are very complicated beasts. They make decisions. LOTS of decisions. Sometimes they have bugs in them. Sometimes they do things in different order. They pull in bits of code from various libraries and link it all together, so if the libraries change, the product changes.
Historically, libraries were ‘static linked’ so that your program, when run, was a constant and unchanging product. You can still set static linking if desired. In the last few decades, dynamic linking has become more popular. The program only loads the bits of the libraries when it runs. This makes them smaller in memory footprint, and it means an update to a library doesn’t require recompiling all the programs that use it (unless the change is a big one and changes calling sequence or parameters). The downside is that changes to the library can cause changes, or failures, of the programs. In the Microsoft world that’s the DLL Dynamically Linked Library failures some may have run into.
While dynamic libraries are a benefit, they do add a little bit of risk of change. Generally manageable since libraries are strongly QA tested and any bugs usually surface fast as they are also widely used. So generally folks are OK with that. For those who are not, you can static link to the old libraries if desired.
Model E goes way past that.
Not only does it give you lots of potential changes from dynamic libraries and compiler runs, it lets you tune and prune compiler flags, linker flags, and more. Turning on or off various sections of the code, adding, changing, or removing bits of it as you like it. Then your “run” is compiled and executed.
I’d originally expected there to be a “run deck” of parameters for any given run, and some relatively static input files of historical data. I’d not expected a lot of “fiddling with the model code” on every run. Essentially the whole model is parameters. When someone says “I ran Model E” and got FOO results, what they really ought to be saying is “I tuned Model E to get FOO results”.
This may well be “normal” in what passes for “Climate Science”, but it is not the way to get stable and reliable results via modeling actual natural processes as defined by observations of nature.
Model, which Model?
You can get a copy of the Model E code from NASA/GISS via this link (and the link in it) that describes the model.
https://www.giss.nasa.gov/tools/modelE/
Your first clue that this is a “Moving River” (as in “You can never cross the same river twice.”) come in that text. Bolding mine.
The current incarnation of the GISS series of coupled atmosphere-ocean models is available here. Called ModelE,
[…]
Model versions used for the CMIP5 and CMIP6 simulations are available via the nightly snapshots of the current code repository, including the frozen ‘AR5_branch’. However, users should be aware that these snapshots are presented ‘as is’ and are not necessarily suitable for publication-quality experiments.
I note in passing the use of “experiments” to describe computer model fantasies… I have nothing against models as tools. They can be very useful. But they are NOT experiments. They are a computerized fantasy that is best used to “inform our ignorance”. Compare where the model diverges from reality to see where you have it wrong. That’s where you learn something.
Sidebar: I ran a supercomputer center for 7+ years mostly doing plastic modeling for making dies. The model was right about 90% of the time saving lots of money on dies. ONE fluid at ONE temperature in ONE fixed space, 10% wrong. That from a well tested widely used program designed for exactly that use, not something hacked up on each run with code changes. At $1/4 Million per set of dies, we were VERY motivated to get 100% right.
The link to the “AR5 Branch” is here:
https://simplex.giss.nasa.gov/snapshots/modelE2_AR5_branch.2017.01.18_08.50.02.tgz
I note in passing that the one I downloaded to do this examination was named:
modelE2_AR5_branch.2016.12.28_08.50.01.tgz
so it looks like that frozen AR5 branch changes daily or weekly…
Or maybe it’s one of these…
GISS Model E Source Code Snapshots
The following versions of Model E source code are available for public download:
modelE2_AR5_branch.2017.01.18_08.50.02.tgz – the version of the code used for AR5 simulations.
modelE2_AR5_v2_branch.2017.01.18_08.50.02.tgz – updated and improved version of AR5 code (still under development).
modelE1_patches.2017.01.18_08.50.02.tgz – the version of the code used for AR4 simulations.
modelE2.2017.01.18_08.50.02.tgz – the latest code from the develpment branch. This code is always under development by definition. It may be unstable or not even compile at all. Use at your own risk.
modelE2_planet_1.0.2017.01.18_08.50.02.tgz – version 1.0 of ROCKE3D model as it is currently used in our planetary simulations. (Still under development, functionality may change at later time.)
So their “AR5 Branch” of archived code is actually 2 versions, one under development, and a bunch of patches related to AR4, maybe, and they have a “development branch” too… Decisions decisions… (And I left out their ‘spectral binning’ code from further down the page…)
GISS Model E Tools
The following tools are available for public download:
Spectral_Binning_GISSGCM.2017.01.18_08.50.02.tgz – Scripts to generate ModelE radiation input files from a raw stellar spectrum.
So you can automate the creation of the stellar parameter fudge… Wonder if it handles the recent cool down of the sun with shift of 10% or so of the UV vs IR levels??…
That’s just for starters…
Down In The Weeds
When you unpack the code, you get this directory structure:
chiefio@Headend:/Climate/modelE2_AR5_branch$ ls Makefile aux config doc init_cond templates README cmor decks exec model tests
At some future time I’m going to describe the technical side of it more, but now I’m just looking at the control knobs.
Typically you start with reading the README file, and a look at the Makefile (that builds any given bit of software in the *Nix world). It looks fairly clean at this point with only one Makefile… but when you dig into it, that changes. First, though, some sizes.
chiefio@Headend:/Climate/modelE2_AR5_branch$ du -ks * 4 Makefile 4 README 1076 aux 92 cmor 132 config 36 decks 440 doc 224 exec 272 init_cond 15996 model 1360 templates 136 tests
Most of the code is in the “model” directory. Yet “aux” and “templates” are fairly large too. 1 MB (1076 kB) for AUX and 1.3 MB for templates. Those will be doing something big. Then init_cond will be setting some initial state too.
What does the README say? It says essentially don’t trust the README… and here’s some untrustworthy clues:
NOTE: The full documentation on how to run the model is in doc/HOWTO.html.
This summary is not complete, and not necessarily up-to-date either.PLEASE READ THE FULL DOCUMENTATION – IT REALLY WILL MAKE YOUR LIFE EASIER!
The directory tree of the modelE has the following structure:
modelE | |-/model (the source code for GCM model) | |-/aux (auxiliary programs such as pre- and post-processing) | |-/exec (various scripts needed to compile and setup the model) | |-/doc (directory for documentation) | |-/decks (directory for rundecks) | |-.R (rundeck for the run ) | |-/_bin (directory for binaries for ) | |-/ (link to directory where you setup | and run ) |-.R ................
Configuring the model on your local system
Intended working directory is directory modelE/decks. In this
directory type “gmake config”. This will create a default ~/.modelErc
file in your home directory. This should be edited so that run output,
rundeck libraries etc. can be properly directed, and so that the
compile options (multiple processing, compiler name , NetCDF libraries
etc.) can be set appropriately.Compiling and running the model.
All rundecks should be created inside this directory and all “make”
commands should be run from there. The following is a typical example
of how to compile and setup a run with the name “my_run”:cd decks # go to directory decks
gmake rundeck RUN=my_run # create rundeck for “my_run”You will need to edit the rundeck in order to choose a configuration
that is appropriate. Once that is done…gmake gcm RUN=my_run # compile the model for “my_run”
gmake setup RUN=my_run # run setup script for “my_run”Make sure that you create the rundeck with “gmake rundeck …” before
running any other commands for this run, otherwise the Makefile will
not understand you. You can skip “gmake gcm …” and just do “gmake setup…”
in which case gcm will be compiled automatically.
Another command you want to run (after creating the rundeck) is:gmake aux RUN=my_run
This will compile auxiliary programs in /aux. All the binaries (both for
model and for auxiliary programs) are put into /decks/my_run.bin .The following is a list of targets currently supported by Makefile:
config – copy the default .modelErc setup to your home directory.
rundeck – create new rundeck
depend – create dependencies for specified rundeck
gcm – compile object files and build executable for specified rundeck
aux – compile standard auxiliary programs
auxqflux- compile auxiliary programs for computing qflux
auxdeep – compile auxiliary programs for setting deep ocean
setup – do setup for specified rundeck
clean – remove listings and error files
vclean – remove object files, .mod files and dependencies
newstart- remove all files in the run directory
exe – compile gcm and put executable into RUN directory
htmldoc – create web-based documentation for this RUNIf you run “gmake” without arguments it will print a short help.
The use of ‘gmake’ to run your model is a big whack with the Clue Stick that this thing does a rebuild on each run. Toward the bottom I’ve bolded a few bits where that is confirmed. Every rundeck a new model is built. We also see that ‘exec’ is where the real making of the program is done. So what’s in exec?
chiefio@Headend:/Climate/modelE2_AR5_branch$ ls exec CommandEntry.pm modelErc runpmE CommandPool.pm pdE setup_e.pl RegressionTools.pm pproc_dep.pl setup_e_new.pl clone protect_ccp_options.pl sfmakedepend comp_mkdep.pl r_to_mk.pl split72.pl compare_module_file.pl regression.py sswE convertToFreeFormat.py regressionTests.pl sumacc_nc editRundeck.sh regression_tests.sh unlock gcmdoc.pl runE get_input_data runE_new
A mix of Perl scripts (.pl) and shell scripts (.sh) and Python (.py) and more. These folks can’t even keep a standard for what language to use. I know, I know, right tool for the job and all that. BUT it make it a maintenance nightmare as you must be fluent in 4 languages right out the gate. (The good news is that they are not that different. The bad news is that they have pernicious differences in some details…)
I don’t know what a .pm file type is, so I looked inside the RegressionTools one. It looks like a bunch of parameter setting. (Is pm short for parameters?)
Here’s a sample from the top of that file:
#package RegressionTools; use CommandEntry; my $standardFlags = "SETUP_FLAGS=-wait"; my $HYCOM = "E1fzhyc"; my $extraFlags; $extraFlags{SERIAL} = ""; $extraFlags{MPI} = "ESMF=YES NPES=\$npes"; $extraFlags{OPENMP} = "EXTRA_FFLAGS=-mp MP=YES NPROCS=\$npes"; $extraFlags{SERIALMP} = "EXTRA_FFLAGS=-mp"; # Intel kludge for matching serial and OpenMP. $extraFlags{E1M20} =""; $extraFlags{E1F20} =""; $extraFlags{E1oM20} =""; $extraFlags{E001tr} =""; $extraFlags{$HYCOM} ="EXTRA_FFLAGS+=-DCHECK_OCEAN_HYCOM"; my $numLinesCMP; $numLinesCMP{E1M20} = 120; $numLinesCMP{E1F20} = 120; $numLinesCMP{E1oM20} = 110; $numLinesCMP{E001tr} = 123; $numLinesCMP{$HYCOM} = 138; sub setModuleEnvironment { # require perlModule; require "$ENV{MODULESHOME}/init/perl"; module (purge); module (load, "comp/intel-10.1.023", "lib/mkl-10.1.2.024", "mpi/impi-3.2.1.009");
Now you can see why I had that preamble about libraries. This thing is setting the library environment as a parameter…
How much of this stuff is there?
chiefio@Headend:/Climate/modelE2_AR5_branch/exec$ wc -l *.pm 157 CommandEntry.pm 148 CommandPool.pm 221 RegressionTools.pm 526 total
526 lines. IF only 1/2 of them actually set parameters, that’s over 250. If 5 wiggles the trunk, what does 5 x 5 x 5 x 2 do?
But that’s a crude guess at best. For one thing, the CommandEntry.pm file isn’t all parameter setting. It runs some scripts inside it that does a variety of things, some obscure at this stage. Some snips from it:
First a bit of Perl that looks to be reading in command line arguments (parameters):
#!/usr/bin/perl package CommandEntry; sub new { my $proto = shift; my ($args) = @_; my $class = ref($proto) || $proto; my $self = {}; $self -> {COMMAND} = " "; $self -> {NUM_PROCS} = 1; $self -> {QUEUE} = LOCAL; # or BATCH $self -> {DEPENDENCY} = ""; # place holder for list of references to other commands $self -> {STATUS} = PENDING; $self -> {STDOUT_LOG_FILE} = "LOG"; $self -> {STDERR_LOG_FILE} = "ERR"; # override defaults with arguments while ( my ($key, $value) = each(%$args) ) { $self -> {$key} = $value; } bless ($self, $class); return $self; }
But further down in has an embedded ‘here script’ in standard Bourne Shell:
my $script = <<EOF; #!/bin/bash #PBS -l select=$nodes:ncpus=$ncpus:proc=neha #PBS -l walltime=$walltime #PBS -W group_list=a940a #PBS -N regression #PBS -j oe ##PBS -o foo #PBS $queueString cd \$PBS_O_WORKDIR . /usr/share/modules/init/bash module purge module load comp/intel-10.1.023 lib/mkl-10.1.2.024 mpi/impi-3.2.1.009 $commandString EOF
and not only is it working off of a command string, but it fiddles with the libraries too.
Yet even that isn’t the end of it. Looking in regreessionTests.pl we find a bunch more environment setting. First off, it calls those other bits above so we know where they get executed:
use Env; use CommandPool; use CommandEntry; use RegressionTools;
Then for some reason it makes an http reference when it is opening a log file…
open(LOG,">nightlyTests.log"); # Make the LOG hot: <http://www.thinkingsecure.com/docs/tpj/issues/vol3_3/tpj0303-0002.html> { my $ofh = select LOG; $| = 1; select $ofh; }
Then it gets down to setting a bunch of environment variables (think of them as parameters that change where the program run gets parts of its context…)
$ENV{CVS_RSH}="ssh"; $env{CVSROOT}="simplex.giss.nasa.gov:/giss/cvsroot"; my $scratchDir = $ENV{NOBACKUP}."/regression_scratch"; $env{SCRATCH_DIRECTORY}=$scratchDir; $env{REFERENCE_DIRECTORY}="$env{SCRATCH_DIRECTORY}/modelE"; $env{BASELINE_DIRECTORY}="$ENV{NOBACKUP}/modelE_baseline"; $env{RESULTS_DIRECTORY} = $ENV{NOBACKUP}."/regression_results"; @modelErcVariables = (DECKS_REPOSITORY, MP, CMRUNDIR, EXECDIR, OVERWRITE, OUTPUT_TO_FILES, VERBOSE_OUTPUT, SAVEDISK, GCMSEARCHPATH, COMPILER, COMPILER_VERSION, BASELIBDIR, ESMF_BOPT, MPIDISTR, NETCDFHOME);
Now from my POV, the PITA (Pain In The Ass…) is that they have things like “simplex.giss.nasa.gov” coded into this thing. To run it elsewhere I will likely need to find and change those things. OK, nice that it is set in a parameter setting bit, but PITA to have it set at all. Typically you give a program a working directory path and leave it alone.
Now the good thing is it looks like they are actually using a CVS repository to capture changes of state. (If only that would do that for actual temperature data…) but the bad thing is they have it coded into the setup programs.
$env{DECKS_REPOSITORY}="$scratchDir/decks_repository"; $env{MP}="no"; $env{CMRUNDIR}="$scratchDir/cmrun"; $env{EXECDIR}="$scratchDir/exec"; $env{OVERWRITE}="YES"; $env{OUTPUT_TO_FILES}="YES"; $env{VERBOSE_OUTPUT}="YES"; $env{SAVEDISK}="$scratchDir/savedisk"; $env{GCMSEARCHPATH}="/discover/nobackup/projects/giss/prod_input_files"; $env{BASELIBDIR}="/usr/local/other/baselibs/ESMF222rp3_NetCDF362b6_10.1.017_intelmpi/Linux";
And we have yet more library fiddle… Just what are those libraries and where do I get a copy? Who knows… Yet Another Dig Here in getting this set up and run. Hopefully the docs cover it.
There’s a whole lot more of that kind of thing in that particular directory, but I’m going to move on to another bit as these weeds are already too deep.
AUX making Inputs
In the “aux” directory there are a lot of files, but the ‘make’ ones caught my eye first.
chiefio@Headend:/Climate/modelE2_AR5_branch$ ls aux/make* aux/make_surface_transient_Alkenes_C90.sh aux/make_surface_transient_BCB.sh aux/make_surface_transient_BCII.sh aux/make_surface_transient_CH4_C90.sh aux/make_surface_transient_CO_C90.sh aux/make_surface_transient_NH3.sh aux/make_surface_transient_NOx_C90.sh aux/make_surface_transient_OCB.sh aux/make_surface_transient_OCII.sh aux/make_surface_transient_Paraffin_C90.sh aux/make_surface_transient_SO2.sh
So we’ve got 10 different programs “make”ing a set of transient parameters for specific gasses. Lots of knobs here.
Looking in the last one (SO2) a sample:
#!/bin/bash module avail tool/idl module load tool/idl-6.4 ulimit -s 6000000 ulimit -v unlimited res='C90' spec='SO2' for year in 1850 do cd /discover/nobackup/dgueyffi/modelE/aux dmget /archive/g08/dmkoch/ftp/AR5/IPCC_emissions_${spec}_anthropogenic_${year}_0.5x0.5*nc cp /archive/g08/dmkoch/ftp/AR5/IPCC_emissions_${spec}_anthropogenic_${year}_0.5x0.5*nc . dmget /archive/g08/dmkoch/ftp/AR5/IPCC_emissions_${spec}_ships_${year}_0.5x0.5*nc cp /archive/g08/dmkoch/ftp/AR5/IPCC_emissions_${spec}_ships_${year}_0.5x0.5*nc . ./remap.pl -par ncregrid-ijl.par -in IPCC_emissions_${spec}_anthropogenic_${year}_0.5x0.5*nc -out IPCC_emissions_${spec}_anthropogenic_${year}_C90_Dec_2009.nc ./remap.pl -par ncregrid-ijl.par -in IPCC_emissions_${spec}_ships_${year}_0.5x0.5*nc -out IPCC_emissions_${spec}_ships_${year}_C90_Dec_2009.nc cd /gpfsm/dnb53/gfaluveg/AR5_emissions/v5_anthro rm -f AR5.bat ./make_AR5_program_${res}.ksh ${spec} ${year} anthropogenic C90_Dec_2009 dom ind wst awb tra echo ".com convert_${spec}.pro" >> ./AR5.bat echo ".run convert_${spec}.pro" >> ./AR5.bat #now run the idl batch file: [...]
Hoo boy… so it grabs a bunch of stuff from “/archives”. But we don’t have a “/archives”. Looks like their “usual dodge” of release the model but not the parameter files driving it; at least at first glance. “We’ll see” as I dig into it in the coming weeks (months? years?).
But wait, there’s More!
Looking at aux/regrid.par (the .par got me wondering if it was more parameters…) we find:
filedir=/discover/nobackup/projects/giss/prod_input_files/ regridfile=remap288-180C90-90.nc imsource=288 jmsource=180 ntilessource=1 imtarget=90 jmtarget=90 ntilestarget=6 format=ij nfields=1 title=yes
Yup. MORE parameters. For one it points to “Yet Another Missing Directory” in that /discover top level qualifier. Then sets about setting more parameters. How many .par files are there?
chiefio@Headend:/Climate/modelE2_AR5_branch$ ls aux/*.par aux/ICOxref.par aux/ncregrid.par aux/regridNOxnatural.par aux/ICgastracers.par aux/regrid.par aux/regridc1x1.par aux/ICsulfate.par aux/regrid1x1Q.par aux/regride2x2.5.par aux/ncregrid-ijl.par aux/regridCH4natural.par
Hopefully they are only used in the post-processing… yet even that determines the results as presented to the world.
About That Makefile
Seems that when you look into it, there’s a lot more to do than that top level Makefile. In the config directory, we find:
chiefio@Headend:/Climate/modelE2_AR5_branch$ ls config ESMF.default.mk compiler.gfortran.mk machine.IRIX.mk mpi.mpich2.mk base.mk compiler.intel.mk machine.IRIX64.mk mpi.mpt.mk compiler.IRIX.mk compiler.intel_7.mk machine.Linux.mk mpi.mvapich.mk compiler.IRIX64.mk compiler.nag.mk machine.OSF1.mk mpi.mvapich2.mk compiler.Lahey.mk compiler.pgi.mk mpi.IRIX64.mk mpi.openmpi.mk compiler.OSF1.mk compiler.xlf.mk mpi.LAM.mk rules.mk compiler.absoft.mk machine.AIX.mk mpi.SCALI.mk compiler.g95.mk machine.Darwin.mk mpi.intel.mk
Now the really good thing is that this ought to have a custom .mk make file for most any kind of machine you want to run this on. But it does show that the “make” is going to be modestly complicated.
Looking in the compiler.g95.mk file as one example:
F90 = g95 FMAKEDEP = $(SCRIPTS_DIR)/sfmakedepend CPPFLAGS += -DCOMPILER_G95 FFLAGS = -cpp -fno-second-underscore -O2 -fendian=big F90FLAGS = -cpp -fno-second-underscore -O2 -ffree-form -fendian=big LFLAGS = # uncomment next two lines for extensive debugging # the following switch adds extra debugging ifeq ($(COMPILE_WITH_TRAPS),YES) FFLAGS += -fbounds-check -ftrace=full -freal=nan -fpointer=invalid LFLAGS += #-lefence endif
We’ve got compiler flags set to adjust the compiler behaviour. Normally this would only interest the developer of a code, but since every run is a compile, these can adjust how the code behaves when run.
Similarly, the machine.Linux file sets things for that environment:
# Linux - specific options CPP = /lib/cpp -P -traditional CPPFLAGS = -DMACHINE_Linux ECHO_FLAGS = -e
cpp is the C Pre-Processor and it has flags to change how it works.
Then you get to the MPI Message Passing Interface setting blocks where you can choose and tune which kind of MPI you are using.
Here’s the one for OpenMPI:
ifneq (${MPIDIR},) FFLAGS += -I${MPIDIR}/include F90FLAGS += -I${MPIDIR}/include LIBS += -L${MPIDIR}/lib endif # try to work around memory leak CPPFLAGS += -DMPITYPE_LOOKUP_HACK #LIBS += -lcprts -limf -lm -lcxa -lunwind -lrt -ldl \ #-lfmpi -lmpi -lstdc++ -threads ##LIBS += -lcprts -limf -lm -lcxa -lunwind -lrt -ldl \ ##-lmpi_f90 -lmpi_f77 -lmpi -lstdc++ -threads LIBS += -lmpi_f77 -lmpi -lmpi_cxx -lstdc++ ifneq ($(shell uname),Darwin) LIBS += -lrt endif #LIBS += -lmpi -lmpigc3 -lmpigc4 -lmpigf -lmpigi -lmpiic4 -lmpiic -lmpiif -threads
That “try to work around memory leak” isn’t encouraging…
In any case, this is relatively normal for setting up an environment to port and compile a program. What is unusal about it is that it is needed for every run.
But that isn’t enough… they have a certain degree of ‘cross linking’ between the various make files. You find that in rules.mk and base.mk:
base.mk:
# standard makefile to be use in each "component" subdir .PHONY: FORCE #get defaults from ~/.modelErc HOMEDIR = $(wildcard ~) MODELERC ?= $(HOMEDIR)/.modelErc sinclude $(MODELERC) #some of modelE definitions #SCRIPTS_DIR = $(GISSCLIM_DIR)/scripts #MOD_DIR = $(GISSCLIM_DIR)/mod #LIB_DIR = $(GISSCLIM_DIR)/lib #hack : make sure that ESMF_Interface is set #ESMF_Interface = . [...]
and rules.mk:
# # this file contains rules shared by Makefiles in "model" and "aux" # .PHONY: ifdef MOD_DIR VPATH = $(MOD_DIR) endif #VPATH = $(MODULES_CMPLR_OPT) #VPATH = /usr/local/unsupported/esmf/1.0.3-551/include ###### Some user customizable settings: ######## # EXTRA_FFLAGS specifies some extra flags you want to pass # to Fortarn compiler, like # -g - include debugging information # -listing - create listings (.L) EXTRA_FFLAGS = # EXTRA_LFLAGS specifies some extra flags you want to pass # to linker. Currently needed as a hack to compile hybrid MPI/OpenMP # code to pass "-openmp" to linker EXTRA_LFLAGS = # hack to force compilation errors to be written to ERR files rather than scren ifeq ($(OUTPUT_TO_FILES),YES) COMP_OUTPUT = > $*.ERR 2>&1 || { r=$$? ; cat $*.ERR ; exit $$r ; } LINK_OUTPUT = > $(RUN).ERR 2>&1 || { r=$$? ; cat $(RUN).ERR ; exit $$r ; } else COMP_OUTPUT = LINK_OUTPUT = endif [...]
Oh, it says there’s another batch of make files in ‘model’… Oh Joy…
chiefio@Headend:/Climate/modelE2_AR5_branch/model$ ls tracers/ Makefile Tracers_mod.F90 chiefio@Headend:/Climate/modelE2_AR5_branch/model$ ls solvers Makefile TRIDIAG.f chiefio@Headend:/Climate/modelE2_AR5_branch/model$ ls shared CONST.f Makefile StringUtilities_mod.F90 Dictionary_mod.F90 PARAM.f SystemTimers_mod.F90 FileManager_mod.F90 PARSER.f UTILDBL.f GenericType_mod.F90 Parser_mod.F90 array_bundle.f Geometry_mod.F90 RunTimeControls_mod.F90 cubic_eq.f JulianCalendar_mod.F90 RunTimeControls_mod.m4F90 stop_model.F90 KeyValuePair_mod.F90 SYSTEM.f chiefio@Headend:/Climate/modelE2_AR5_branch/model$ ls -l Makefile -rw-r--r-- 1 chiefio chiefio 3400 Dec 30 2015 Makefile chiefio@Headend:/Climate/modelE2_AR5_branch/model$ ls profiler Makefile TimeFormatUtilities_mod.F90 Timer_mod.F90 ProfileReport_mod.F90 TimerList_mod.F90 ReportColumn_mod.F90 TimerPackage_mod.F90 [...]
and several other directories with their own Makefile in them. Yet even that isn’t enough complexity. Seems you can choose what make files get used sometimes.
Looking in ‘doc’, we have this in Ent.txt
Working with Ent DGTEM (Dynamic Global Terrestrial Ecosystem Model) =================================================================== This document describes how to checkout, compile and run modelE coupled to Ent DGTEM. [...] Compiling the code ================== Since in this new setup some of the code is located in subdirectories the Makefile had to be modified to be able to work with subdirectories. Basically a completely different makefile was written for this purpose. To switch to this new makefile you have to specify NEW_MAKEFILE_FORMAT=YES in your ~/.modelErc file. This new makefile expects two additional sections to be present inside the rundeck. Section "Components:" lists the names of the subdirectories (from now called "Components") containig the code which has to be compiled with the model. The section "Component Options:" contains the flags which have to be passed to each component (i.e. the flags which have to be set when local makefile is executed inside corresponding subdirectory). Each component (subdirectory) contains a makefile which in most cases is trivial: it contains the list of source files and includes standard template makefiles from config/ directory (see model/shared/Makefile for an example). If rundeck should be able to modify the list of source files in the component that can be done by passing corresponding "Component Options:" (see how it is done inside model/Ent/Makefile). Current template rundeck for modelE coupled to Ent is E1M20dv1.R (see it as an example of how to use "Components:" and "Component Options:"). The commands supported by the new makefile are basically the same with some small differences. To compile and set up a typical modelE+Ent run you would do: # remove old object files if necessary (old "vclean") make clean_all # create a rundeck make rundeck RUN=my_run RUNSRC=E1M20dv1 # build dependencies make depend RUN=my_run ESMF=YES NPES=1 # compile and do setup make setup RUN=my_run ESMF=YES NPES=1
So all this environment fiddling ends up happening on each ‘compile and run’ of a job.
Interestingly, they left in some ‘local notes’ in the ‘local’ file: local_info.txt
I’ve bolded a bit where it shows why I talk about libraries and compile flags and such.
Running on Discover (Reto Ruedy) +++++++++++++++++++ News: ==== [...] Unresolved bugs: =============== - occasionally, jobs tend to "hang" for extended period of times, usually during i/o processes (creation of PRT files) hint: for long runs use "kdiag=13*9," not "kdiag=13*0," - serial and parallel versions do not always give the same result hint: see next section (Consistency ...) - intelmpi: may be used on all current and future nodes - scali mpi: can only run on the relatively few old 4-cpu nodes - openmpi: may be used on all nodes but is not recommended (use intelmpi) - compile with: EXTRA_FFLAGS="-DMPITYPE_LOOKUP_HACK" to avoid memory leak - Bundling openmpi jobs is currently impossible Compiler issues: =============== For off-line fortran compilations use: ifort -convert big_endian x.f -o x.exe Before doing so, load the appropriate modules, currently: /etc/profile.d/modules.sh ; # bash or ksh or: source /etc/profile.d/modules.csh ; # tcsh or csh and either module load comp/intel-9.1.042 module load mpi/scali-5 ; # for scali mpi (on borga-e...) or module load comp/intel-9.1.052 ; # for openmpi module load mpi/openmpi-1.2.5/intel-9 or module load comp/intel-10.1.017 ; # for intel mpi module load mpi/impi-3.2.011 This is to be put into your startup script (e.g. .profile if your login shell is ksh), so these modules also get loaded when running in batch mode. "runpbs (=run)" was modified to load the appropriate modules for mpi batch jobs even if those commands are missing in the login script. But you still need to set them appropriately during the "setup" step.
Sheesh… different contexts / compiles give different results and we swap around MPI libraries at compile time based on what we like or what what works today or…
All that BEFORE you get to the massive parameter setting inside the model!
Consistency between serial and parallelized executables: ------------------------------------------------------- openMP and serial runs give the same results ONLY if using the option -mp (machine precision) in both cases, i.e. use gmake setup|exe RUN=.... EXTRA_FFLAGS=-mp EXTRA_LFLAGS=-mp Warning: The current version of modelE no longer supports openMP mpi and serial runs (without the -mp options) give identical results (but different from serial runs and openMP runs with -mp)
Um, maybe you ought to figure out why the results vary with different MPI context and just fix the code?… Just sayin’…
Having code that is significantly ‘context of run’ or ‘context of build’ sensitive is just not making my “Oooh that’s reliable” warm fuzzy feeling happen…
There’s lots more interesting in that file (and in other files) but this posting is too long already, so most of it will need to wait for “another day”. I’ll just put this interesting snip here as a parting tease:
For serial runs, no options are needed; however if you want results identical to that produced by an openMP executable, add EXTRA_FFLAGS=-mp EXTRA_LFLAGS=-mp NOTE: When changing options, make sure to use "gmake clean vclean" to get rid of old object modules and avoid incompatibilities.
Yeah, if you skip a ‘make clean vclean’ you can have incompatibilities…
There are lots of other parameter settings described in other .txt files. Here’s just a sample from the oceans biology one. Note: This is NOT the “oceans”, just the oceans biology obio.txt:
Prescribed and Interactive Ocean Biology in modelE (by Natassa Romanou, Watson Gregg and Gavin Schmidt) ========================================= This document describes the steps and methodology for running (non)interactive chlorophyll and interactive carbon cycle etc. This document describes the steps and methodology for running (non)interactive chlorophyll and interactive carbon cycle etc. step 1: Check out the model from CVS as usual cvs checkout modelE step 2: Choose which ocean model to use if any at all step 3: Choose whether you want to do i) chlorophyll-radiation feedbacks ii) real or prescribed chlorophyll and ocean biology iii) CO2 gas exchange with the atmosphere iv) prognostic alkalinity Then in the decks directory use the appropriate rundeck. ========================================= Description of CPPs to be used in the rundecks #define CHECK_OCEAN_HYCOM ! HYCOM ocean: needed for CMPE002 only #define HYCOM_RESOLUTION_2deg ! HYCOM ocean: use 2deg resolution for HYCOM ocean #define CHECK_OCEAN ! RUSSELL ocean: needed to compile aux/file CMPE002 #define TRACERS_OCEAN ! RUSSELL ocean: ocean tracers activated #define TRACERS_OCEAN_INDEP ! RUSSELL ocean: independently defined ocn tracers #define OBIO_ON_GARYocean ! RUSSELL ocean: prognostic ocean biology #define CHL_from_SeaWIFs ! RUSSELL ocean: read in SeaWIFs #define TRACERS_OceanBiology ! ANY ocean: Watson Gregg's ocean bio-geo-chem model #define OBIO_RAD_coupling ! ANY ocean: radiation -- ocean biology coupling #define pCO2_ONLINE ! ANY ocean: pCO2_seawater computed online #define TRACERS_ON ! ANY ocean: ATMOS tracers: include tracers code #define TRACERS_GASEXCH_ocean ! ANY ocean: special tracers to be passed to ocean #define TRACERS_GASEXCH_ocean_CO2 ! ANY ocean: special tracers to be passed to ocean #define CHL_from_OBIO ! ANY ocean: interactive CHL #define TRACERS_GASEXCH_CFC_Natassa ! ANY ocean: CFC-11 or CFC-12 #ifdef TRACERS_Alkalinity ! ANY ocean: prognostic alkalinity
In Conclusion
What are in those rundecks? Rundeck.txt tells us:
Rundeck =================================================================== A rundeck (a file with an extension .R) is a file which contains a complete description of a particular model run, including the description of model code used and run-time parameters. Directory modelE/templates contains typical rundecks which can be used as examples. Rundeck structure ================ Rundeck consists of a number of sections describing different aspects of the run. Each section starts with certain keywords (like "Object modules:") and terminates either with "End ..." statement or with the start of a new section. The sections are supposed to follow in a pre-defined order and can't be interchanged, though some unneeded sections can be skipped. The character ! starts a comment. Everything to the right of ! until the end of the line is ignored. Here is the list of rundeck sections in proper order: * Run name and comment * Preprocessor Options * Run Options * Object modules * Components * Component Options * Data input files * Label and Namelist * Run-time parameters * Restart and timing parameters
Yup. From preprocessor to compile and choosing object modules to lad and more… Decisions decisions. Just what tune to play and how do I want this elephant dancing while playing the trombone?…
* Components ============ Starts with a line Components: This section lists all "Components" (the subdirectories of modelE/model) which have to be compiled with the executable. Keep in mind that each Component will be compiled into a library which will be then linked with the main executable. As a result in a case of a name conflict (more than one subroutine with the same name is present) the preference will be given to the one located in modelE/model directory. * Component Options =================== Starts with a line Component Options: This section lists specific options which have to be passed to certain components. For each Component the options are listed on a single line as follows OPTS_ = = = where Opt1, Opt2 are variabes to be set and X, Y are corresponding values. The instruction above is equivalent to setting Opt1=X, Opt2=Y at the start of Makefile. There is a special debugging option OVERWRITE_FSRCS which can be passed to any component (which uses base.mk). If this option is set to a list of source files, these files will be used when compiling the component (instead of the ones specified inside the component Makefile). The use of this option is discouraged for the production runs though (which should use the source files specified by the component itself). Similar option OVERWRITE_F90SRCS can be used to specify F90 files.
Every run is a unique mix of libraries, compiler options, parameters set, input data choices, model parts chosen to run, and even, if you like, special versions of any given bit of Fortran you might want to swap in…
The choices are endless. The whole thing from soup to nuts is a parameter driven process.
And that’s the whole problem.
I am not well at programming and running computers, but it seems the chaotic nature of weather is built into the code too, so that you can have chaos in the second degrees.
I just wonder how any scientist can have any confidence in the output from such a program.
Would they ever know what they asked the program to do?
Maybe the GCM’s really comes with unprecedented outputs every time.
This drawing gets more and more actual: https://wattsupwiththat.files.wordpress.com/2013/09/josh-knobs.jpg?w=720
The “profession” of programming is as much DOCUMENTATION as it is writing code (I hated the former, but saw the value in it).
Trillions being wasted on what amounts to a worthless hunk of letters and numbers.
How about a simple flow chart, something like
|Garbage in| –>|Climate_Kludgifier| –> |Garbage Out| ——->> To MSM.
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
———————————————————
| Adjustable Parameters + Module Variables |
—————–Fudge Modules———————
Leading to

Ouch. Yet TPTB want to impose a tyranny on us based on a poorly specified model derived from an incorrectly specified question. UGH
Svend
Love that cartoon. I think the best part is the guy gazing down the side of the machine with the controls for the sun. Priceless.
tom’s printout is funny too. Snowflakes are experiencing kernel panic.
I thought .pm in code (especially code that also has regular perl files) is “perl module”.
@Soronel:
Might well be. I stated I don’t know and was guessing. I’ve done some minor intalling of Perl, and scanned a book on it a couple of decades ago, but would not claim skill, or even familiarity with most of it. Any clue appreciated.
@Svend:
That cartoon fairly accurately depicts how ModelE is built and run, IMHO.
E.M.
This is still an iterative finite element model, suitable for a finite number of lumped constants, never for compressible fluid dynamics. NOAA at least uses a fluid dynamics model, suitable for an incompressible fluid (ocean). Have you found the stated limitations on parameters that may hint at possible convergence for the GISS model? Even the fourth order Runge Kutta method for solution to Laplace transforms, has severe limitations for convergence.
My opinion is that Jimmie Hanson knew full well that the GISS model could never work ‘as he promoted to the US congress’, but could be easily be ‘tuned’ to give ‘his’ desired result. Gavin Schmidt knowingly continues to promote such deception within the US government!
All the best! -will-
Some decades ago, I watched a grad student run a hydrology simulation via a punch card deck (a real run deck) that ran the output back in the input reader until the simulation was completed. His run deck was as equally incomprehensible. His explanation was that using the reader was not charged to his account and wasn’t authorized to use permanent storage. Creating solutions one problem at a time, Mother of invention , and all that.
”
I thought .pm in code (especially code that also has regular perl files) is “perl module”.
”
Yes, definately true.
Hmmm sounds like they discovered the Machine epsilon problem. With the sort of problems they are working with the absolute precision of of calculations and constants matter, if the compilers use different rounding or calculation precision or constant value precision differs between two machines, iterative processes will slowly diverge as those rounding/precision errors add up on each loop through the calculation.
I learned about this when I was in about 7th grade when I discovered I got different answers to the same problem on two different calculators. Turned out the constant Pi used by the two calculators differed slightly with one using a few less decimal places in the value of Pi it used when you hit the Pi constant key.
https://en.wikipedia.org/wiki/Machine_epsilon
Larry Ledwick says: 19 January 2017 at 2:01 am
” Consistency between serial and parallelized executables:… option -mp (machine precision) in both cases .Hmmm sounds like they discovered the Machine epsilon problem.”
The 1970’s ‘calculators’ used all kinds of ‘corrections’ to trigonometric functions to give ‘best estimate to that number of digits’! Oh woah are we..!!
Given your comment on complilers, it would be interesting to run the compile on the same source code several times and see if the resulting code hashes to the same value each time.
Larry Ledwick says: 19 January 2017 at 2:39 am
“Given your comment on complilers, it would be interesting to run the compile on the same source code several times and see if the resulting code hashes to the same value each time.”
Larry,
The machine computation of the same, exact can lead to the same result. Machine ‘compilers’ do not compute the same exact, They can only ‘try’ to interpret the ignorance and dyslexia of earthlings incessantly banging away at keyboards. Benevolent monarchs do that well. All other forms of government fail completely! . -will-
Hi EM – you wrote “Then for some reason it makes an http reference when it is opening a log file…” but I think the “#” is that it’s commented out?
The actual link goes to an FAQ about buffering of Perl i/o or somesuch…
Not being much of a programmer – I’m an electronics chap – I’m just reading your post and looking for loose ends that might make the thing unravel, on the basis that if I can’t find any it’s more likely to be accurate.
(One of the things about the Climategate data that convinced me that the skeptics were right was the comments in code and that readme file)
I regret I was not paying proper attention to your previous posts so I missed where this model fits with the stuff that they are currently running – is it representative?
@Gareth:
I usually do a very rapid first pass scan, not following all the rabbit holes, but just noting things in passing. Then I’ll go back and “Dig Here!” on things that were most interesting.
The Model II code is ‘way old’ and mostly run by students as a learning toy at this point (since it runs on one PC…). This code is the Model E code. It is THE current work product of NASA GISS for climate modeling (though in several slowly mutating forms, from their page description of it). So yes, very representative. In fact, that AR5 tag is saying this is the code used for the AR5 report runs.
My general impression of it is that it is “Good enough for government work” and will, in fact, run reasonably well if you are careful in the setup. I just grouse about the poor technique in it because it would not be acceptable in my shop. I’d have told the relevant programmers in staff meeting (i.e. in front of peers) that they needed to tighten it up so it ran the same way in all environments.
Now how much do things like libraries and compiler variations matter?
For most codes, hardly at all to none. But models are different. At Larry pointed out, when they iterate over the same code again and again, even the tiniest of error can accumulate to gigantic proportions. Yet even for more ‘traditional’ code, it can tickle pernicious bugs.
That is why it was “kind of a big deal” when the world went to dynamic libraries (in the ’80s?) and why it is still a big deal on any new OS release that the exact library release level is specified that is known to work well with the rest of system. Look at a Linux release spec, and you find library release numbers listed along with kernel and package release levels that are “compatible”. That means “didn’t make bugs dance in the QA suite like the other libraries did”…
For Model E, it looks like mostly they are just using the particular libraries needed on divergent hardware (as it runs on many platforms in their ‘cluster’) and / or the particular libraries needed for a particular version of MPI chosen. No problem with that. PROVIDED they code was QA checked and shown to produce the same results in all those environments. Yet their messages say it doesn’t. Ooops.
Similarly, most of the time a compiler produces the same code out for the same program in. Yet not always. The compiler makes optimizing decisions based on various flags set and various environment variables and the machine context. So by submitting this to compile and run on divergent systems with divergent environment and divergent flags, you will get different executables. Have they ALL been regression tested? Have they ALL passed a QA check? Heck, even a rough “Sanity Check”? I doubt it… The code is structured to run as a mutating toy world where you run it, see if you like the result, and if not, change things and toss it in again. There is no way to regression test or adequately QA test that.
OK, they are not making an accounting package nor doing a drug trial system. No need to satisfy Government Oversight nor Legal Liability. Got it. But it is still just sloppy compared to how the commercial product world does things. (Yes, I know, science is a seat of the pants sloppy thing… but once you claim to know the control knobs of the world, best to do it right and rigorous…)
So far my general assessment is that the mistakes will be in the assumptions about what is actually the controlling system and the data fed in, with only minor coding errors. They ASSUME CO2 is the master knob, so find it is the master knob. They ASSUME the solar constant is, well, constant and they ignore the differential allotment of UV vs IR, so miss those effects. Etc. etc. That’s why they diverge from reality.
But that ‘not the same unless FOO flag set’ is a worry. That says they have mathematical instabilities in their code. FORTRAN can have very subtle bugs from ‘edge cases’. Underflow and overflow. Fixed limits to precision at the small and large ends. Zero that isn’t really zero… I’ve written FORTRAN a while now and it was my first Computer Language. I was fortunate enough to have it taught to me in the Engineering Department where they were very concerned that we know these things as the codes would be used for things like bridge design and they did NOT want deaths from their graduates. The whole problem set was carefully crafted to guide us into just those traps. Burned fingers not easily forgotten… (After the 2nd problem set, I caught on and started looking for what trap was being set – got A+ after that… That was the point where my Fine Toothed Comb was born for code reviews…)
So do I think there’s a math error or coding error in their code? Certainly. Probably several. It is manifestly sensitive to the environment and does not always produce stable results. “Captain, there be Bugs here!”… but do they matter? Probably not as much as the general structure of what the code is designed to do, which is find CO2 causal. Yet at this point I can’t rule out that their whole exponential growth of temperature output isn’t just an accumulation of very very very tiny errors due to a poor handling of math in some limit case. (Like ‘if zero exit loop’ when the code never hits EXACTLY zero… better is if the value is within 0.00000001 of zero, exit loop. That, BTW, was one of the problem sets…)
Anyway, it is a giant block of code, so a long slog to get through it. Made worse by what looks like lots of ‘hidden dependencies’ on “stuff” from other machines and other directories (those absolute path names starting with / as in /archives) that are not available to the unwashed public. But as usual: We’ll see…
E.M. SMITH says “No need to satisfy Government Oversight nor Legal Liability.”
$Billions are being spent on the output of these models – there should be criminal charges for ‘scientists’ who hold their work as accurate and important but turn out to be YMMV.
Perhaps this is why we hear so much about ‘It’s worse than we thought’ as a simple change of parameters can make the Earth fall into the sun.
Could be i am ignorant, but how would a climate scientist perform a certain run of a climate model to investigate some aspect of climate?
I find it hard to see him fidling with all those libraries and settings, so how do he orders a run, and would he know what else was set by the “runner”.
Uninstalling Obummer…
Perhaps you can toss out all of the slimy bits and put a functional model together that would actually show the actual weather? Did they make it ugly just to hide the fact that it is CO2 that they are correlating to temps?
@Glenn999:
That is my general (high level and maybe over ambitious) goal.
@Gail:
Think you meant the TT page… but yeah… just minutes to go… Pence now V.P.
@Svend:
They make the ‘run deck’ and some of them will do the set up. Others might ask an I.T. helper to do the general set up for machine, libraries, etc… then just do the run deck.
I’m not sure that this thing is worth saving. When you consider the nature and concepts that it was created under, It might be better to examine it, learn from it, and then set it aside, start over. Sun and water physics need to be properly allowed for and then modified by secondary factors. The data base of records need to be solid.
Normally, The embeds of formula in a spread sheet are set and the data inputs change. In this case the data needs be set and some of the embedded formula tweaked to learn of their effect on the output. The ability of the users to find, understand and adjust the control knobs is of importance. The wide spread ability to utilize software and hardware to examine cause and effect in weather/climate along with the use of the Internet will allow thousands of minds to work on this problem, and do it inexpensively and fairly quickly.
When Hanson and Jones started this effort only a government could afford the cost of that effort and they were nearly ignorant of computer possibilities, Believing that super computers could instruct or tell them how weather/climate worked. We know that computers are very simple minded. They only will tell you the answer that you can find for yourself. They just will do it quickly, or if garbage goes in, garbage comes out and you have to know if you are getting garbage.
I see this as a spread sheet, data base problem, similar to the ones that I created and used to examine and do various problems both in longhand and with the help of desk top computers…pg
@P.G.:
Essentially so.
One good thing from the first blush code scan: Since THEY swap modules, turn source on and off at will, even overlay source as desired, then I can do the same and point any complaints back at ModelE design structure.
At this point, my goal is to find the minimal subset that runs and see what happens. Basically, prune out modules and kick start them, then glue bits back together. Build it back from those stubs needed to make the core “go”.
Not swallowing the whole whale at once, but one chunk at a time…
For your readers,
I am an engineer we must understand the nature of precision.
Part of the precision problem is this: all numbers in a computer have limited precision. Say I wanted to calculate 2/6*3 if we assume limited precision (integers) then we calculate 2/6 = 0, 0 x 3 = 0, but compilers might be clever, they will know that the intermediate division will be imprecise and will reorder the calculation to be 2*3=6, 6/6=1 so in one case where the compiler doesn’t optimise you get 0, in the case where is does reordering optimisation you get an answer of 1. This occurs at EVERY precision. To get around this engineers often force the evaluation order writing (2*3)/6 because the brackets are always evaluated before * and /; This isn’t the only problem, in floating point precision is lost for very small numbers. That is the difference between X and X + delta X is a much larger proportion of X for small values of X.
Engineers go to a lot of trouble to manually reorder calculations so divisions by large numbers NEVER OCCUR.
Now think about this, using gcc I can set an ENVIRONMENT variable called CFLAGS, in CFLAGS I can set -g to get symbols in the code, BUT if I recall correctly -g also disables all optimisation including reordering, which means that merely setting debug mode changes my answer from 1 to zero. This has actually happened to me back in university when writing genetic algorithms, in that case the code worked fine in debug mode but failed with debug turned off.
A big clue here is that they say the results change on multiprocessor machines, this says to me that there ARE precision sensitivities like this in the code, that they are unhandled, and that they MATTER.
Oh, another issue for the readers who aren’t engineers.
Lets Say I have 5 digits of precision. I write
100*10 + 10/1000 – OK seems reasonable
100 x 10 = 1000.0 to five digit precision
10/1000 = 0.0100 to five digit precision
Adding we get
1000.0 + 0.010 which is 1000.01 – but that’s 6 digits of precision, at 5 digit precision the answer is 1000.0
Now let’s say the code repeats over an over adding 0.01 to 1000 a million times which should equal 11000 but due to precision the answer given is 1000.0 – this is what we call divergence. It’s bad, unless you ensure you don’t add small numbers to big numbers divergence can be a real problem in computers. Typically simulations or “Models” are of the Form X2=X1 + delta (f (X1) ) where f(X1) is some function of X1 and usually a small increment, so iterating means X can get larger while f(x) is small or even becomes smaller, eventually you’ll diverge.
@Bobl
Great simple examples of computational errors using computers. You have to understand the computational methods used by the code at a very deep level to avoid (or minimize) these sorts of issues. By the nature of the math, you cannot avoid it completely any more than you can exactly represent the square root of 2.
Sometimes you can express the problem symbolically so that the actual numerical conversion only happens one time at the final output stage, if you can work out a general solution that applies at all scales, but when doing iterative math loops where you go over and over the same calculation each pass feeding a solution to the next pass through the loop you accumulate those small errors until they become big errors.
It would be interesting to see a high level computer math person do an errors analysis of what they are doing and figuring out the maximum number of passes through the iteration you can run before the errors run away with you.
@Bobl:
Yup! Wonderful examples. Especially loved the ‘with debug’ vs not. Priceless. It is the accumulation of a few thousand of those that makes an experienced programmer valuable.
I once had a program where the NAME of the function call changed the results. Drove me nuts for a week trying to ‘fix it’. Called the vendor: Compiler bug… not me at all… Short term fix? Change that function name to something else until it works… so for a month or so while writing it, every time it failed, I’d swap all my FUNC W to FUNC Z or back. It worked fine every time in the interpreter… (A language that was both interpreted and compiled… so I’d develop in the interpreter fine, but compile and Oh Oh… change all the FUNC W to FUNC Z again…) The compiler used the function name as ‘salt’ in the hash function so sometimes landed it on code space where it ought not to be… but change the salt, no problem…
One of my other favorite ‘gotchas’ is testing a FLOAT for .EQ. zero or 1 or some other INT value. Rarely does a FLOAT exactly equal zero or 1 or “whatever”. One of the problem sets that bit most folks was just such a thing. The better way being something like:
IF ( VAR .LT. 0.000001 ) GO TO ….
Where the exact value of 0.000001 is set at your error limit choice, assessed after thinking about just what is happening down in the maths… I remember helping other students where the test problem had gone from (limit of precision like) 0.0000000000001 to -0.0000000000001 never touching 0.0 and them wondering why their code was wrong… Making it a DOUBLE instead of a FLOAT doesn’t fix the problem, just moves it into longer run times….
Yet what do we find in the “Model II” code? Lines like
Now one hopes. this is only after having set it to '0.' in the prior line for some cases so it can be zero exactly; but every one of those cases must be checked.
Now most of their comparisons are INTEGER, where that’s OK to do (so you have a variable starting with I through N compared to a value with no trailing decimal point, so 1 not 1. or 1.0 that flags it as a FLOATing point constant) yet through multiple programs there are the odd lines that compare a FLOAT to a fixed value…
Now I’ve not “done the math” to assess the risk in these. This is just a first pass “turn over the rocks” and see what squirms… BUT the existence of lines like that makes me squirm. It seriously needs thinking about to see if this would be better:
or better yet
Where ERRORBAND is set for each particular machine and word size used…
FWIW, on a first pass my assessment is that the programmers of Model II were very workmanlike, but a bit incautious at times, like above. They know the language and the rules well, avoiding most “issues”; but taking risks where I’m not sure they have mitigated them (or even realized them) in the odd lines like those two above. (Do note: It is perfectly FINE to do something like enter a loop and on the first pass have set the value of a variable to “0.” and then test for “0.” to not process in the same way on the first pass. The odds of subsequent passes ever hitting exactly “0.” are very small. What bites is doing any math then testing for “0.” or “1.” to do anything…)
This potential flaw does not get smaller in the Model E code, it gets much larger, with more complicated bits like:
(where mxkprf.f is the program name spit out by ‘grep’ the Global Regular Expression Print pattern matcher used to search for .EQ. lines…)
and:
Now I don’t know that XL and TEMP are FLOATS. (Need to check their declarations) but so far I’ve not seen them using INT range variables as floats nor float range variables as INTs, so I think it likely a fair assumption at this point. But this is exactly the kind of code that gets you into deep doo. I really really hope that “DRTSAFE” is something like Decimal Range Test Safe error band, that would go a long way to bringing comfort…
Sea Ice has this same class of issue:
so one hopes it is only a ‘if newly entering the code having set value to ‘0.’ just prior to entry do initial set up, otherwise do something else” which is valid if a tiny bit sloppy.
More a worry is:
MODEL_COM.f: IF (STR.EQ.TIMESTR(N)) THEN
Comparing two FLOATS for equivalence without error bounding it (one hopes they are declared string types as the STR implies).
Then season with math like:
Where you can accumulate those minor errors described above and / or underflow and overflow of FLOAT ranges and / or… and then stuff arrays into IF tests like:
and you are just begging for one of those trivial hard to even see error accumulators and error bar breaking IF test faults to really screw your pooch.
(No offense intended to pooches…)
Every single bit of the math in this thing needs checking for just those kinds of subtile Aw Shits that happen when doing computer math and logic tests with typed data (or even untyped data in many cases… the computer still has those issues even if the language designer tried to work around them…)
BUT what really frosted my shorts was the note that you get different results if you set mp vs not so just make sure to do it… instead of saying “and this is bad” or “and this means we’ve got precision sensitive code in there, John, start digging.” or … Just “Hey, look, cool. Watch for it and just know the runs will be different, not that it matters…”
IMHO one of the major risks in “Climate Science” is folks with Ph.D.s in sciences who took one class of FORTRAN grammar and think they are a programmer now. They may have learned the syntax and grammar but they don’t have the scars yet… and they don’t have anyone doing code reviews, QA regression testing, bug reports, etc. etc. to accumulate them… so they just keep ploughing on ‘Deeper and Higher’…
Bobl says….
Yes very good examples. Even someone like me, who is a ‘computer challenged,’ non-engineer could follow it. Although you still need some science which unfortunately is not taught in the USA school systems.
@E.M.Smith
Agreed, I find it hard to believe that the equivalence problem is not dealt with properly, many compilers deal with the comparisons automagically, but this just makes it worse because the programmers get sloppy. It is insidious though.
Even ( In C)
float a
double b
a=22.0/7.0
b=22.0/7.0
if ( a == b) …
Can fail because of the sign extension of a to 64 or 80 bits.
Whats worse, some compilers choose when to use software floating point or the FPU. While the results of both calculations are stored as 64 bits the intermediate results on an FPU are stored at (IIRC) 80 bits. So two instances of the SAME calculation at the same declared precision can be different depending on whether the compiler decided to use the FPU or not.
It doesn”t even stop at equivalence consider
float a
double b
a=22.0/7.0000000001
b=22.0/7.0
if ( a < b ) …
There is every possibility that the low precision of a and the errors in sign extension to 64 bits or the intermediate results precision could cause a to be greater than b, Even though the code "defines" that a is less than b. I can't think of a way for a compiler to get this right every time.
I used to teach realtime C and Modula 2 to university students this was always one of the first things I taught.
Engineers are taught this stuff, The computer science people I studied with sometimes had no idea about it. Self taught "Climate Scientists" have less than no idea. Maybe things are better now, but I doubt it.
(Oh I omitted semicolons, it’s kinda pseudocode – don’t try running this)
@Gail Coombs
I just thought i’d explain to non-technical readers what happens in real life programming no science really, just a little math. In real-time programming things get even worse because values can change as you read them or faster than you read them from memory. The Time at which you do a calculation or – worse the time a calculation takes to do, can change the result. I remember chastising my University real-time computing lecturer, he gave us an assignment including code. It deal with a buffer that put a character from a serial port onto the beginning of a list (singly linked list for those in the know), to take the character off the list one had to move through all the characters in the list, from the beginning to end, following the links from one to the next.
( Gail, think of this as taking a chain, and then following the chain from where the chain is bolted to the gate to find the last link so you can apply a lock.)
Now think about this, moving from one link to the next takes time, so traversing the full chain takes around n x T(1 link) where n is the number of links in the chain. The delay in receiving characters from the serial port though is a constant time, so as you add characters at some point it takes longer to traverse the list that the period between characters, Removing characters from the list can never be done as fast or faster than the characters are put on the list, at that point this length of the list grows without end until you run out of memory and BANG, the 747 falls out of the sky.
Ah, the joys of realtime programming.
Note that the professor giving the lecture ( Head of School in IT – No Less) was not very impressed with my observation when I pointed it out in class.
@Bobl:
As I noted before: I was very fortunate to learn FORTRAN and ALGOL in the Engineering Department… In later years, saw what Computer Science was doing and found it somewhat sad. LOTS of glitz and new ideas and complexity hiding… and complexity ignoring that leads to indeterminate code and outcomes…
One really MUST know what is going on to have a clue what happened…
FWIW, I’d not been exposed to the GPU / FPU issue in school… Ran into it in a discussion of ARM chips where the NEON GPU does non-IEEE compliant math, but you can set it to soft float for IEEE compliance, or hard float for “close enough”… so even the particular chip set determines your results (NEON vs other GPU / FPU ) and what compiler flags you set (soft vs hard float) AND what the OS builder did (set it to run hard float or soft…).
How many folks dumping a FORTRAN run into the new box they got know what the chip set is, what the OS build is set to use of that chip set, and what the default compiler flags do with the chip set? And what all that means to the results of their math and how that interacts with their branching and calculation steps?
As one wag said:
“We give these people computers and we expect them to know how to use them?!?”
BTW, I’ve not yet run into even ONE non-Engineer who wants to think about such things…
The general attitude seems to be “Hey, it compiled, so it must be right!”… Or “This time for sure!”…
IIRC our Burroughs B6700 had an ALU (Arithmetic Logic Unit) but no FPU. (Also an odd word size and interesting protection bits…) so I can forgive them not teaching FPU issues, I guess… This link has lots of interesting bits of worry in it, but also has an interesting B6700 story:
http://catless.ncl.ac.uk/Risks/24.52.html
And folks wonder why I write code like:
If A do FOO; GoToEnd;
elseif B do BAR; GoToEnd;
endif;
if NOT A OR NOT B GoToError;
Print(You Can’t Get Here!);
exit(1);
I’ve had code like that print out “you can’t get here!”…
I never had a need to learn FORTRAN, and most of my programming was in Databus (a very-efficient language on the Datapoint machines), versions of Basic on various boxes, but mainly assembler. These gotchas in Fortran, though, may well explain why the code doesn’t do what the programmers intended. I’ve looked at C and C++, but didn’t have a need to actually write programs in them (assembler got me to running code a lot quicker, and bug-free) but found the definition of the language somewhat sloppy. Since C variants are supposed to be self-documenting, few people put useful comments into the code to say what they intend the code to do, making understanding the intentions (and finding the bugs) a somewhat longer task than it should have been.
In writing code, I first write the comments saying what the subroutine is intended to do, then implement it. This way round, it helps in keeping the logic straight. Once I’m out of the programming fugue, it also makes it somewhat easier to check for bugs – I don’t have to remember what I intended that chunk to do.
With assembler, there weren’t many flags to set (if any) and the machine did what I told it to, so if it gave the wrong answer it was my fault. Not so true nowadays, with out-of-order processing and the compiler assigning the core and thread. There are likely ways around that, but these days I no longer write code so it’s Somebody Else’s Problem. It does however seem to me that there are far more opportunities for the running code to not reflect what the programmer intended, with all the compiler flags and the optimisation passes that delete sections of code that the compiler thinks are unnecessary. I’m sure some optimisation runs would delete that bit of code “Print(You Can’t Get Here!” because it should never run, which would defeat the reason for putting it there.
It seems that the high-level languages automagically convert one data-type to another when needed, rather than complaining about it. I’d rather we were told, and if a floating-point number changed length or precision it’s flagged. There’s more words needed if you need to specifically change the type before using it in a new way, but at least you know it’s happened and what difference it will make. There are a lot of gotchas, and it’s currently looking like the GCM is falling foul of them in a lot of places.
I won’t be much of a help in debugging the code – too much Fortran experience needed, and it would take quite a while to understand all the possible unintended errors. You only get that experience by hard work for several years, and there’s a lot to learn. At the moment, it looks like you will get this code to run, but it’s so badly-designed that each run will return a different answer. Butterflies are programmed-in….
The world itself has a lot of vortices, and as far as I can see this code doesn’t really model those. This is likely a problem with a cell-based approach anyway, where the development of a vortex inside a cell just won’t be modelled and only the extant vortices that stretch across multiple cells can be considered. If you get the initial conditions a bit wrong, then the model will diverge from reality. It could be a bit difficult to get that right, though, except by using much smaller cells. At ground level, we know that hills produce thermal vortices, so having a “hilliness” number for a large cell won’t help that much – we need to model what each hill does, and also each high-rise building.
With most of the climate models being proprietary and thus the code unpublished, we can’t check any others for bugs or physics errors. We know that they all diverge, so most of them (and probably all of them) must be wrong. Not a good state of affairs when we’re trying to make multi-billion dollar decisions based on the results.
@Simon: Very interesting comments…
FWIW the “can’t get here” experience was not in FORTRAN.
Yes, optimizers would likely remove it, but one can set optimizer level. Usually little or none during devo, then high before production and final regression test.
I, too, found hand assembler easy and clean…
Folks who write C just think it self documents. It doesn’t. IMHO, less so than Pascal, ALGOL, ADA, or COBOL.
Many of the GOTCHAs in FORTRAN exist just as much in other languages. Precision, typing, underflow and overflow, EQ testing floats. Just Engineers and FORTRAN folks think about them more… Some, like I to N self declaring as ints is a feature… I like FORTRAN more than most languages for numeric tasks.
Auto type conversion is variable by language. C is loose so good for low level and OS writing with casting when desired, just bit stuffing when needed. Pascal is a straight jacket… Others in between.
That vortex point is one I’ve pondered… watch earth.nullschool and pretty quick you realize this place is all circular motion in 3D … and the models do rectilinear or hexagonal tiles without circular motion. Fundamentally broken at that first design point. The world is made of fractals and vortexes that are not in the model structure. How’s that gonna work…
Just…. wow! I just downloaded and had a look at a couple of the Perl scripts….
There is some very convoluted thinking going on in there… very bad code.
Normally the first things to do when looking at Perl code is to ensure that warnings and the ‘strict’ pragma are enabled to ensure that basic errors are caught (declaring variables and then not using them, declaring the same variable name twice etc). Errors and warnings out of the box…
Then there is the logic flow which is a whole nother realm.. :/
@wyzelli:
Interesting experience, isn’t it? :-)
There is a very different ‘style’ as compared to commercial programming standards. More like an erector set with a model gizmo stuck together with odd bits, and some other optional modules laying in the box…
FORmula TRANslator, and yes I am fond of it and BASIC, an ALGOL derivative. Way back when, before the rise of Computer Science, programming was either in the Mathematics Department or the Engineering Department. When I took my programming classes at the University, that was the situation. [Yes, I have been one of those student prototyping programmers. It was fun ;)] And yes, the good practices of the time were drilled into our heads. You’d diagram your code (one way to generate a document to cross-check coded comments) before keypunching a deck or go play with the interactive dumb terminal so your generally small modules were as bug-free as possible (100 to 1000 lines). The first Assembly code I did was 6502 assembly. I’d do the same kind of diagramming for it. Hand assembly, when done well, used to be more efficient than the compiler or interpreter’s generated stuff. I’m sure that present day compilers are fed test assembly script to see if the code generators are working as well as they can.
I like a fast FORTH threaded interpreter! With such you can do ‘anything’ including DRONK, STONED, or STUPID! These should be fundamental FORTH words! Da faster it goes, da faster you can fix da inevitable AW s**t!!
All the best! -will-
@CDQuarles:
There was also, in the early ’70s, a small presence in the Econ departments as they were discovering econometric modeling. But at my school (U.Cal.) you got a Math degree or an Engineering degree for computers. I was an “odd duck” in that I took Engineering classes with lots of computers while being an Econ major… but that whole econometric modeling had caught my attention…
Also there is something about keypunching a deck and the finality of a hole in a card that made a fella want to have it right before booking punch time and buying cards… I miss the old 029 keypunch…
@Will Janoschka
I ran into FORTH in about 1983? and really liked it. On my shelf is a book on creating Threaded Interpretive Languages. At the time I was working on mainframes and they didn’t want to let me install a low level interpreter on the hardware ;-) so I didn’t get to do much with it.
Instead I fell into Unix quasi by accident. Most scripting languages are threaded and interpreted, so I found outlet there. Since then I’ve done a LOT of scripting, often in the threaded interpretive model. One trivial example:
I have a set of commands to make more commands.
Note how “cmd” calls “allow”? Yes, trivial example, yet it works nicely…
Now if I’m going to do just about anything, like, say, move directories to an archive, I’ll type:
cmd archive
be tossed into an editors where I can type:
cp -r $1 /My/Archive_dir
ls -l /MyArchive_dir/$1 $1
and then have the word “archive” in my set of words and after that to archive a thing I can just type:
archive FOO
and it is done.
(The actual form wads it up into a compressed tar ball but I left that out for the not so *Nix folks to not glaze too much… but “tar cf – $1 | (cd /MyArchive; gzip > $1.targz)” comes close though many systems now support tar with a compress flag built in. Oh, and I usually do: archive FOO& so it runs as a background task).
So even if not FORTH, my early exposure to the concepts of FORTH drive my daily activity. Just yesterday I added a word to inspect the status of my newly built raid array:
raid: mdadm –detail /dev/md0; echo; cat /proc/mdstat
I’ve saved myself endless hours of typing by compressing commonly used commands and options into single words in a very FORTH like manner…
Were I going to bit twiddle down at the hardware level, I’d want to use FORTH or something like it. (But most likely would end up using C as it is The Way on *Nix boxes…)
It is interesting to note that scripting drives the basic functions of most things *Nix for exactly those kinds of FORTH like property reasons. (Part of why I don’t like SystemD, it violates all that.) Just being able to look at (and potentially change) the “words” of *Nix administration is a huge benefit, lost in the SystemD “binary blob” approach.
Can you imagine a machine with a code region that has distinct addressable locations of “machine code” but no other addresses whatsoever. All addresses are on a ‘stack’ of pointers to an encrypted base of pointers back to the code region for what to do with the values on the other stack? I tried that once; much like going insane! 2+2 = 4 and this is close enough for government work. I believe I will have another beer!
No, I can’t…
;-)
What purpose to encryption inside the machine to then just decrypt…. just pot stirring?
Did once work with an OS with a ‘single level store’ design. Disks, memory, you name it, all just one big address space… and yes, there were, um, “issues”… (what to do when swap space ended up on that disk pack on the shelf over there… or a tape shipped offsite… and just how do you make backup copies of critical data sets?… that might be anywhere in the total store space…)
E.M.Smith says: 5 February 2017 at 1:19 am
“No, I can’t… ;-)
What purpose to encryption inside the machine to then just decrypt…. just pot stirring?”
Yes and no! This was done in a Verifone keyloader for point of sale terminals. DES and 3DES so no one could snoop and find out! I did anyhow!