What Ocean Warming?

I don’t know the origin or back story for this graph.

I just saw it a few times in a discussion in this page at Tallbloke’s place:


And there, the discussion is about the red cycle blobs of seasonal warming, not about the overall trend. Yet when I look at it, the big feature I see is that the ocean is getting colder. The red seasonal heating is shrinking, year over year, and the deep dark blue is getting greater and colder year over year.

It is from: http://www.climate4you.com/images/ArgoTimeSeriesTemp59N.GIF

Where the name implies this is Argo data. I presume “59N” means from 59 degrees North.

Argo Data from 59N? showing ocean cooling year over year

Argo Data from 59N? showing ocean cooling year over year

So does anyone know if there is a “back story” to this about the Argo floats showing ocean cooling? Or is this a cherry pick of one slice?

Subscribe to feed

Posted in AGW Science and Background | Tagged , | 2 Comments

Procedural Languages, MySQL, Massively Parallel, and Climate Modeling

This posting will be a bit eclectic. Several threads of the day pulled to one spot.

First off, most of the day was spent reading an SQL book and some of the MySQL Manual and more. Then an hour went into “relaxing” with a video about Procedural Langages, that turned out to be more interesting that I’d expected.

In the first bit of reading I found out that MySQL has a “Spacial” or Geographic data type, or set of them. It also has a parallel server model. Ponder what that implies for a moment…

Then, in the second bit, not only was there a fun stroll down memory lane (at about 9 minutes where Algol and all it begat was discussed), but also at about the 50 something minute mark there’s an interesting graph about mutable data vs shared data and how only data that is both mutable and shared needs a control structure to coordinate access (thus ‘multi-threaded’ and parallel processing is hard and needs some kind of coordination mechanism but straight procedural code on immutable data or non-shared data does not…)

Now glue those thoughts together…

What does a database system do? It provides concurrent data access to a large collection of data to many independent machines and processes, along with a locking mechanism

Now say you wanted to build a geographic model of the world, wouldn’t it be nice to have a data structure that was designed to represent the surface of a globe with a coordinate data type? One with parallel data access and locking built in? Wouldn’t it be nice to separate the procedures of finding and presenting and storing and locking data items from the code that does the computation? Then, with a database that can be distributed over many servers for much faster operation and much higher bandwidth (presuming a switched network as in the backend of a Beowulf cluster) and a cluster of small computers also on a switch fabric:

One could design a data layout of cells with properties and each compute engine could just be handed a cell, check the “work needed flag field” and if it shows new data has shown up (input of new solar heating or new mass of air from a neighbor or…) it could do whatever computations were needed, put the output data in the appropriate cell “inbox”, update the present cell, and move on.

So each cell becomes a database data object, with independent access. Each operation done on it becomes an independent operation done by any worker board that finds the work flag set, all running in parallel. Coordination and locking provided by the combination of the database update locking facility and the “inbox” of data computed from some other cell.

For example, the cell at 35 N x 56 E has a computation that shows 10% of the mass moves SW, so puts 5% of the mass in the inbox of the cell directly SW, and 2.5% in the inbox of the cells directly S and W. Then it sets it’s inbox to zero (having done the processing on it) and moves on. Then some worker board can “discover” that the cell directly SW has 10% more mass entry and needs computing; so it then sets the locked flag, computes things like pressure, mass flow out to other cells, wind speed, changes of enthalpy from altitude changes, whatever. Writes the appropriate stuff into the “inbox” of the appropriate cells, unset the lock flag and move on.

Essentially, put the data storage, structure, locking and management into a database engine, and then very simplified “per cell” computation into a gaggle of independent computation engines that wander around looking for work to do. ( I’d likely make a table in the DB where LRU -Least Recently Used- cells were at the top and when a cell was locked for processing it was moved to the end of the list, then each worker could just pop a cell off the top of the list, lock it, and go check it out / work on it).

With the flow of control and data management all off-loaded from the “climate programming”, those bits of code ought to become much simpler. Furthermore, the only limits to how many independent processors you could have working on the system would be the number of cells (after one processor is dedicated to each cell, you don’t gain anything from another one being locked out…) and the switch fabric bandwidth in aggregate.

No, I don’t know if it would actually be faster. That depends on too much that is not known. But network speeds are now in the GBit range, and with a switch fabric that becomes multiplied by each switch channel (so a 24 port switch can have 12 x Gbit simultaneous transfers happening). It ought to scale fairly linearly. At some point, the limits of the parallel distributed database engine will start to bite, but I don’t know when that is. There are folks running truly gigantic database engines, though, so it is very very large.

If nothing else, it would be a fun experiment in a different way to do parallel programming solutions.

So that’s the Bright Ideal Of The Day. BIOTD? ;-)

The Video

Here’s the 1 hour video on Procedural Programming. Does a nice job of covering the history, lets you see how central ALGOL was to the “modern” languages. Unfortunately tosses some barbs at FORTRAN, despite it being a very nice very workable language that does many things better than any other language; thus it continuing to be the Go-To-Language (pun intended) for Scientific Programming and much of Engineering (and climate models). He also shows a lot of other fun bits of code to demonstrate how “modern” concepts show up in other languages, from Simula to Java Script. He even manages to toss in some (weak) humor on things like Europeans preferring to start arrays at 0 (so the “1st Floor” is up one flight of stairs from the ground) while Americans like to start arrays with 1, one being the value of the first positive integer…

It isn’t too technical for anyone with some computer background to follow (and those with zero background can marvel at the crap we programmers think about and wonder why we create a whole new crop of languages every decade… {think “Ph.D. in Comp. Sci. requires doing something ‘new’…}) Yet it also has enough tech chops in it to be interesting to more experienced folks.

While much of it was a walk down ‘memory refresher lane’ for me, I learned a couple of new bits. I also found some justification for my feelings about O.O Programming as he points out most of the properties of it are available in procedural languages if done ‘right’ and the concepts date from then. Just that nobody bothers to do them much ;-) It was also fun to see a couple of languages I haven’t used (Simula, Java Script) and see how familiar they looked… Seems many of the “new” languages are mostly old languages with a facelift of sorts or a new concept grafted on.

Some MySQL References

So here’s a couple of links to bits about MySQL that started me pondering. I’ve pretty much soaked up most of the language now and I have a good grasp on how they use keys and indexes; and how they are mushy on the two terms and mix their meanings…

Along with that, I have a nicer appreciation for how they do JOIN; though my old 2007 or so book claims MySQL doesn’t do “OUTER JOIN” only a “LEFT JOIN” and “RIGHT JOIN” – and I’m hoping it has been added since then. This manual page talks about LEFT and RIGHT as “outer join”, but that is different from the “give me all records even those with blanks in the join” that the “OUTER JOIN” in Oracle and Microsoft has in addition to LEFT and RIGHT.


Since the ability to create a VIEW does a very nice job of hiding the JOIN complexity and via using INDEX fields can be nicely efficient (as it pulls indexes into a hidden table for faster processing – and has a Optimizer that figures out the best access strategy): I’m now more willing to factor the GHCN and other temperature data into more fragmented tables (“normalize” the data more).

THE big pain of fully normalized DB design is the need to pull data from 10+ tiny tables to do anything Every Single Time you want to knock out a report. The ability to create and store a VIEW means that can be done once and then is hidden from you when you create your report. Nice. So I’m going to revisit my “toy” Temperature DB layout with the idea in mind of both more smaller tables and more “VIEW”s of them as one big flat table. Plus add more keys (things that make each row a unique entry and prevent duplicates) and indexes (tables of entries that let you go right to a specific datum from a known value) – even if MySQL treats those as synonyms in some contexts…

Give that even using the existing load of data that is largely treated as one large non-key non-index flat file is “plenty fast”, everything else will be faster. So, OK, I’ll tune it up some more and use more of the SQL language features.

Parallel Servers

MySQL has a bunch of different “Storage Engines” you may choose. These give you methods of storing data opimized for different uses. Mostly they recommend the InnoDB engine (that assures rollbacks and crash safety and such) but also include others.


Chapter 16 Alternative Storage Engines

Table of Contents

16.1 Setting the Storage Engine
16.2 The MyISAM Storage Engine
16.3 The MEMORY Storage Engine
16.4 The CSV Storage Engine
16.5 The ARCHIVE Storage Engine
16.6 The BLACKHOLE Storage Engine
16.7 The MERGE Storage Engine
16.8 The FEDERATED Storage Engine
16.9 The EXAMPLE Storage Engine
16.10 Other Storage Engines
16.11 Overview of MySQL Storage Engine Architecture

Storage engines are MySQL components that handle the SQL operations for different table types. InnoDB is the default and most general-purpose storage engine, and Oracle recommends using it for tables except for specialized use cases. (The CREATE TABLE statement in MySQL 8.0 creates InnoDB tables by default.)

MySQL Server uses a pluggable storage engine architecture that enables storage engines to be loaded into and unloaded from a running MySQL server.

To determine which storage engines your server supports, use the SHOW ENGINES statement. The value in the Support column indicates whether an engine can be used. A value of YES, NO, or DEFAULT indicates that an engine is available, not available, or available and currently set as the default storage engine.

Further down in the manual it has another 2 chapters. There are two cluster oriented bits. The NDB cluster that holds everything in memory, so with huge machines with oodles of memory you can have lots of GoFast, and the (more interesting to me) Cluster InnoDB option.


Chapter 21 InnoDB Cluster

Table of Contents

21.1 Introducing InnoDB Cluster
21.2 Creating an InnoDB Cluster
21.3 Using MySQL Router with InnoDB Cluster
21.4 Working with InnoDB Cluster
21.5 Known Limitations

This chapter covers MySQL InnoDB cluster, which combines MySQL technologies to enable you to create highly available clusters of MySQL server instances.

21.1 Introducing InnoDB Cluster

MySQL InnoDB cluster provides a complete high availability solution for MySQL. MySQL Shell includes AdminAPI which enables you to easily configure and administer a group of at least three MySQL server instances to function as an InnoDB cluster. Each MySQL server instance runs MySQL Group Replication, which provides the mechanism to replicate data within InnoDB clusters, with built-in failover. AdminAPI removes the need to work directly with Group Replication in InnoDB clusters, but for more information see Chapter 18, Group Replication which explains the details. MySQL Router can automatically configure itself based on the cluster you deploy, connecting client applications transparently to the server instances. In the event of an unexpected failure of a server instance the cluster reconfigures automatically. In the default single-primary mode, an InnoDB cluster has a single read-write server instance – the primary. Multiple secondary server instances are replicas of the primary. If the primary fails, a secondary is automatically promoted to the role of primary. MySQL Router detects this and forwards client applications to the new primary. Advanced users can also configure a cluster to have multiple-primaries.


InnoDB cluster does not provide support for MySQL NDB Cluster. NDB Cluster depends on the NDB storage engine as well as a number of programs specific to NDB Cluster which are not furnished with MySQL Server 8.0; NDB is available only as part of the MySQL NDB Cluster distribution. In addition, the MySQL server binary (mysqld) that is supplied with MySQL Server 8.0 cannot be used with NDB Cluster. For more information about MySQL NDB Cluster, see Chapter 22, MySQL NDB Cluster 8.0. Section 22.1.6, “MySQL Server Using InnoDB Compared with NDB Cluster”, provides information about the differences between the InnoDB and NDB storage engines.

The following diagram shows an overview of how these technologies work together:

Now if I’ve got that right, with multiple primaries you can have multiple read/write engines at one time and then they replicate between themselves. At least, I hope that is what it is saying ;-)

Note that this is in MySQL 8.0 and I know it is not in MySQL 5.7, I also don’t know what version is presently available in Debian / Devuan, so it might be something that is “buy it only”…

Geographic Data Types


Chapter 11 Data Types

Table of Contents

11.1 Data Type Overview
11.2 Numeric Types
11.3 Date and Time Types
11.4 String Types
11.5 Spatial Data Types
11.6 The JSON Data Type
11.7 Data Type Default Values
11.8 Data Type Storage Requirements
11.9 Choosing the Right Type for a Column
11.10 Using Data Types from Other Database Engines

MySQL supports a number of SQL data types in several categories: numeric types, date and time types, string (character and byte) types, spatial types, and the JSON data type. This chapter provides an overview of these data types, a more detailed description of the properties of the types in each category, and a summary of the data type storage requirements. The initial overview is intentionally brief. The more detailed descriptions later in the chapter should be consulted for additional information about particular data types, such as the permissible formats in which you can specify values.


11.5 Spatial Data Types

11.5.1 Spatial Data Types
11.5.2 The OpenGIS Geometry Model
11.5.3 Supported Spatial Data Formats
11.5.4 Geometry Well-Formedness and Validity
11.5.5 Spatial Reference System Support
11.5.6 Creating Spatial Columns
11.5.7 Populating Spatial Columns
11.5.8 Fetching Spatial Data
11.5.9 Optimizing Spatial Analysis
11.5.10 Creating Spatial Indexes
11.5.11 Using Spatial Indexes

The Open Geospatial Consortium (OGC) is an international consortium of more than 250 companies, agencies, and universities participating in the development of publicly available conceptual solutions that can be useful with all kinds of applications that manage spatial data.

The Open Geospatial Consortium publishes the OpenGIS® Implementation Standard for Geographic information – Simple feature access – Part 2: SQL option, a document that proposes several conceptual ways for extending an SQL RDBMS to support spatial data. This specification is available from the OGC website at [http://www.opengeospatial.org/standards/sfs].

Following the OGC specification, MySQL implements spatial extensions as a subset of the SQL with Geometry Types environment. This term refers to an SQL environment that has been extended with a set of geometry types. A geometry-valued SQL column is implemented as a column that has a geometry type. The specification describes a set of SQL geometry types, as well as functions on those types to create and analyze geometry values.

MySQL spatial extensions enable the generation, storage, and analysis of geographic features:

Data types for representing spatial values

Functions for manipulating spatial values

Spatial indexing for improved access times to spatial columns

The spatial data types and functions are available for MyISAM, InnoDB, NDB, and ARCHIVE tables. For indexing spatial columns, MyISAM and InnoDB support both SPATIAL and non-SPATIAL indexes. The other storage engines support non-SPATIAL indexes, as described in Section 13.1.15, “CREATE INDEX Syntax”.

A geographic feature is anything in the world that has a location. A feature can be:

An entity. For example, a mountain, a pond, a city.

A space. For example, town district, the tropics.

A definable location. For example, a crossroad, as a particular place where two streets intersect.

Some documents use the term geospatial feature to refer to geographic features.

Geometry is another word that denotes a geographic feature. Originally the word geometry meant measurement of the earth. Another meaning comes from cartography, referring to the geometric features that cartographers use to map the world.

It also has a nice set of functions that work on Spacial data types:


12.16 Spatial Analysis Functions

12.16.1 Spatial Function Reference
12.16.2 Argument Handling by Spatial Functions
12.16.3 Functions That Create Geometry Values from WKT Values
12.16.4 Functions That Create Geometry Values from WKB Values
12.16.5 MySQL-Specific Functions That Create Geometry Values
12.16.6 Geometry Format Conversion Functions
12.16.7 Geometry Property Functions
12.16.8 Spatial Operator Functions
12.16.9 Functions That Test Spatial Relations Between Geometry Objects
12.16.10 Spatial Geohash Functions
12.16.11 Spatial GeoJSON Functions
12.16.12 Spatial Convenience Functions

MySQL provides functions to perform various operations on spatial data. These functions can be grouped into several major categories according to the type of operation they perform:

Functions that create geometries in various formats (WKT, WKB, internal)

Functions that convert geometries between formats

Functions that access qualitative or quantitative properties of a geometry

Functions that describe relations between two geometries

Functions that create new geometries from existing ones

For general background about MySQL support for using spatial data, see Section 11.5, “Spatial Data Types”.

I find that a rather interesting data type!

I didn’t find any kind of “altitude” in the Spacial data types, so it looks like they expect a smooth one altitude sphere. So, OK, you need to do the altitude layers as a set of repeated fields inside the cell and do altitude processing on that cell in layers. Got it. Temp_ground, Temp_5km,Temp_10km, Temp_15km, Temp_20km, etc.

It still moves that whole “lay out a global grid and process through it” out of the flow of control logic of the program and into the data structure.

Also, since you can write modules to extend MySQL, the really hard core folks could extend the Spacial data type to include an altitude axis…

In Conclusion:

So there you have it. Some “chewing gum for the mind” to work on for a while… I know it’s made my mind a bit tired today, chewing on all of it.

OTOH, I’m much more prepared to make a decent go of it with getting a proper data structure into the GHCN / USHCN etc. DB I’m building. I’m still not keen on their DATE data type (it is fairly inflexible about the order of YYYY-MM-DD (though the Oracle version supports my preferred MMM as letters format) but I may try loading the STATION LAT LONG data into a Spacial data type, just for grins.

I’m also ready to take a stab at some keys, indexes and views. But not, I think, tonight… It’s already been a long day slaving over a hot computer ;-)

Subscribe to feed

Posted in GCM, NCDC - GHCN Issues, Tech Bits | Tagged , , , , , , | 3 Comments