Just noodling around some ideas for the Top Level Flow Of Control part of a Cell Based distributed climate model. I’ve already discussed that there isn’t any really good way to lay out cells, so using an equal distance cell center generated layout. Then a general look at stuff that happens inside the cell.
Here I’m looking at a sort of pseudo-code view of the order in which steps would likely happen (subject to change at any time as actual coding turns up issues with this starting point).
This will change as I think of things I’ve missed, change my mind on some point, and review prior art for models and turn up “issues” to deal with.
For those not familiar with it, pseudo-code is a kind of programming shorthand. It is kind of like a programming language, but not any real language in particular. It is kind of like English, but has a bit more programming like structure and meaning to the words. The purpose is to let write out ideas on how to program something without needing to fuss over which particular language construct and do you end phrases with a “;” a “.” and “END” or nothing in particular. Basically, all the minutia crap are left out.
So I’d see things as being roughly:
call Build_World (cell_count) spawn Monitor call Cells (years, parameters) write output reports end
The Monitor is a future program that would track data like what year you are processing, what is the load on systems, is anything stuck, etc. It can only really be worked out after the program itself is done, so at this point just a place holder.
“Parameters” is just a place holder for whatever exact parameters are needed, once that’s figured out.
This program just manages the ‘big lumps’. It launches the monitor and creates any summary reports at the end also. Here is where you can pass a “size of world in cell numbers” value and a ‘years to run’ parameter.
call cellgen (cell_count) call celldatagenerate output cellmap, celldata_db end
Generate the actual map of the globe. Figure out what each cell has for neighbors, what angle they are to each other, what Latitude and Longitude, apply global surface types to cells, and generate all the particular data for that cell (eventually to include things like altitude average and vegetation and such).
This is where you create the database of all the data the following calculations will need. For the eventual very big models, this might be run once, and data stored in a database for all subsequent runs. Basically, you may choose to run, or not, celldatagenerate.
While JobQue less than limit: For Each year in years For Each day in year For Each hour in day For Each cellid in cellmap spawn cell (cellid, parameters) Next cellid Next hour next day Next year EndWhile; end
This is basically just managing the job queue. Don’t want to pump out a million jobs all at once on a 4 core R. Pi now do we? (Crash!) So it’s managing the Job_Queue. In many distributed compute system, there’s an automatic queue management, so details to depend on how that’s implemented.
I see the iteration as being over date (so we know seasonality data to apply) and time (so insolation by time of day is correct).
Then for each Cell#, we spawn a compute process to a compute node to go figure out the values for that particular cell, based on date, time, and neighbor “gozouta” data that it’s “gozinta” data.
I’d start with all the “gozinta” data buckets filled with the initialization data from celldatagenerate, then have it iterate over time. Stepping from top of globe to bottom (initially) lets us run with very few cores at the start and knowing that all the cells are ready to run. Eventually, with massive 1 core / cell and many cells, I’d expect to change this to where each core just watches for it’s “gozinta” buckets to be filled and the date:time:done flag set. That will be a bit more chaotic on which cells run when, but ought to work OK. (In theory, some areas could “run ahead” of others a little… creating rings of different time periods as the “gozouta”s propagate).
call insolation (cellid, celldata, year, day, hour,) call surface (cellid, celldata, year, day, hour, insolation, return=temp) call subsurface (cellid, celldata, year, day, hour, temp, return=temp) # The subsurface function will look at celldata and season, and # decide if it does the ocean depths, rocks, sand, permafrost, mud, etc. call evaporation (cellid, celldata, year, day, hour, temp, wetness, return=temp) call airlayers (cellid, celldata, year, day, hour, temp, RH return=temp) call wind (cellid, celldata, year, day, hour, temp) call IR (RH, temp, return=temp) Write Gozouta Flags & Data end
This is the core work flow of the cell. Just what physics are you going to do, in what order. What are you leaving out, and how much detail do you compute. For example, notice there is no “heat island” around metroplex areas being considered. Eventually, at very high cell numbers, that might matter. For now it will be hidden in the averages of data.
These things have circular dependencies, both laterally to neighbor cells, and vertically into air layers (and ocean layers). But also in the time dimension. I try to capture some of the time aspects inside the individual sub-processes with iterations. (See below).
The general notion is that it all starts with surface heating. So figure how much energy reaches the surface first.
Then, some of that surface layer heat (that I figure is including about a 25 m air layer and buildings) will be subducted down into the ground or water bodies (though net heat must come out of the ground so in winter the heat goes the other way). This is a minor effect process so will likely be done as a plug number initially. But “someday” it ought to be addressed. Especially bottom of the ocean volcanoes…
The surface heat remaining can cause evaporation (or transpiration from vegetation). This is our first bite at the water apple… So figure out, based on surface type and wetness, how much water goes into the air as vapor.
After that, do air layers (with iterations) to distribute the air in vertical winds and create clouds and precipitation and all the rest. Yeah, the big lump…
Once you have that part of the air and water flow worked out, and know your remaining humidity and temperature, move the air laterally as wind, apportioned to your downwind neighbor cells based on angle (but arriving with their initial momentum vector…)
After all that, work out what IR might have left to do, since now you have clouds and humidity and all that data to work it sort of properly.
Write out any remaining bits of data to the database and set the “I’m done” flags.
At this point, this cell process terminates and the Cells job manager gets an open queue slot to spawn the next cell process for the next CellID. Eventually, as mentioned before, I’d like this to be one process / cpu and at this point just enter a spin/wait state until the “gozinta” values show up and the ‘good to process’ flag rises.
Here’s a brief description of what I see the parts doing:
Looks at cell LAT, LONG, Date and Time and calculates insolation % of TSI impinging on surface. Basically, it’s taking the panel of the cell and figuring the tilt relative to the sun and how many Watts land on that space at the top of the air column. It might make sense to calculate the values in advance as they ought not change much year to year over a few decades. Database lookups might be cheaper.
Taking into considerations the transmission of the existing air column above the cell (initialized value, then iterated by the model) figure out how much of the Watts incident get absorbed at the surface. Consider surface type (snow, water, vegetation, dirt, sand) from celldata and any surface water from model precipitation and body of water percent in the celldata. This is where all the messy stuff with tree leaves, seasons, albedo change with snow & etc (or static with Sahara sand…) get factored into a theoretical instant surface heating (Watts, eventually temperature).
Initially likely to be a plug number. Eventually, based on the type of subsurface coded for that cell, some amount of surface heat soaks down into the rocks, sand, permafrost, whatever or mixes in for ocean. Think of it as a capacitor with a resistor to the surface. Then, when winter comes, some subsurface heat migrates out to the surface. It’s a ballast function.
Also, precipitation soaks into the dirt or runs off in rivers. Heat goes with it, too, so that’s allowed for (somehow…) in this section.
Prompt surface water evaporation into Humidity numbers. Needed to properly handle convection, cloud formation, obscuration by clouds formed, precipitation, etc.
The water that didn’t soak in is available to evaporate, as is surface water of rivers, lakes, seas, and oceans.
For Gozinta Winddata compute new air properties For cycle in iterations For layers in atmosphere compute density and vertical displacement compute change in humidity, cloud formation, and precipitation compute solar heating by layer (UV in straosphere, etc.) and upwelling IR. etc. Next layer Next cycle End
This is likely the hardest one. It will iterate a few times trying to take the surface evaporation change to air density into account, and have the moist / warm air rise, forming clouds at the points where RH exceeds capacity to hold water and eventually precipitation (output to surface wetness…). Initially I’d use just a couple of air layers and maybe a 20 minute time step, so 3 iterations inside the hour. But eventually it would need more. Perhaps a lot more.
Take the change of air mass in a cell (via incoming cell parameters from neighbor cells) and adjust for humidity and temperature changes (from above routines) and compute how much air mass moves what direction into which cells. Write the Gozouta values to neighbor cells based on direction angle to wind.
Finally, at the end, after convection, wind, evaporation, precipitation, etc. have done their thing, figure out what IR might do. Biggest impact will be from variable humidity in the air layers and from clouds.
Likely this will need to be split over Tropsphere (little IR outbound) and Stratosphere (lots of IR outbound) with cloud tops in the tropopause a special problem. Not sure how many layers / iterations will be needed.
At this point we’re ready to write the “Done With CellID at TIME” flags and any remaining “gozouta” data that has not already been written for our neighbor cells to be able to compute their next iteration.
At this point I envision that “gozouta” flag being, basically, a date:time stamp saying “I’m done with this step”, but a simple integer “cycle#” might be more efficient. Details for later…
Then we are done with the cell process for this date:hour and this cell instance terminates. The processor is freed to go get another cell to process, or just wait for an updated “gozinta” flag for this cell.
Depending on how you do job assignments. Initially, with only a dozen or so cores, having each cell end and be reissued by the Cells process makes more sense. Otherwise you might have a few thousand jobs in queue all checking “Can I run yet?” and leaving no time to actually run one. Eventually, I’d like to see “one CPU / cell” and the program just stays resident watching for gozinta update to just run another cycle.
But at first, it’s just “write out data to database and end”.
Which brings up the other potential bottleneck point. One Big Database will be easiest at the start, but more efficient with 1000 to 10000 cores would be each cell with it’s data resident in memory, and only the interprocess communications going between CPU / SBC units over fast network connectivity. Again, an implementation detail for later as an enhancement / porting to massively parallel hardware.
IF I’m right, this ought to allow for a “one core per cell” compute paradigm with little bottleneck potential. You can basically scale processing power almost linearly with cell number. (Some interprocess communications may still limit at high values depending on how the SBCs communicate)
I envision writing this over an extended period of time (unless other folks wanted to jump in too…) with initially runs being one function at a time on only a few cells. So, for example, a 32 cell world with only surface written, then adding subsurface (perhaps as a fixed value for all cells, later to be enhanced to variable by celldata subsurface type) and eventually adding an “ocean currents” section into subsurface for ocean cells (with layers like the airlayers, with differential IR vs blue vs UV absorption)
Then continuing to add one function at a time.
Once it’s all working OK on, say, 256 cells, crank it up to 6400 on a big cluster and see what happens ;-)
Like I said, this is the first spaghetti on the wall version. Feel free to make suggestions, toss rocks, etc. etc.
Also, anyone wants to take this, or parts of this, and run with it, feel free. The more someone else does, the less I have to do and the faster we get something. Copy Left Attribution and all that.