OpenMP, OpenMPI, MPICH decisions decisions

When things are new, a thousand sprouts form. Eventually one comes to dominate and becomes the long lived tree. (Or the monopoly or near-monopoly “industry leader” for companies).

Parallel computing is in just such a stage. It’s been a good 20 years now, since it really started taking off; and now things are getting mainstream. Eventually the “shakeout” stage will come. We’re already seeing some of it in “consolidation” of some versions of parallel “extensions” to various languages.

Yet we still have variations of MPI Message Passing Interface. OpenMPI, MPICH, … Then there is OpenMP that is confusingly close to OpenMPI as a printed name. OpenMP handles threads inside a multi-core processor chip with shared memory. OpenMPI / MPICH are oriented toward network connected systems, so pass information around in “messages” and don’t expect shared memory. (You can use MPI inside a multi-core machine, but it isn’t as efficient as OpenMP.)

I’ve mostly looked at MPI over the last few years. I figured it was time to look at OpenMP. It runs on compiler pragma statements so it’s relatively easy to look at some code and have a decent idea what it is doing, even if you don’t know OpenMP in any detail. BUT, to write code and have it work correctly takes a much better understanding. So I went looking for an example.

This was the first one I ran into, and I like it.


Well, it is written for the R. Pi specifically, so any “Pi-isms” would have surfaced.

It is comprehensive. Covers an actual port of code, how to find optimization candidates, and has everything you need to do the same.

It is simple. Despite being comprehensive, the example is very easy to follow and “why” is clear at each step.

The author takes the time to be patient and complete. Not a lot of “magic hand-waving” you must work out.


Not much. There’s a little bit of “accent” to his English. I think it may be Indian Standard English. But if you can’t accept a few odd pronoun usages and one or two number conflicts, you ought not be speaking English anyway as there are at least a half dozen major variants.

It is written in C, which is good for folks writing C, but I’m interested in porting some Fortran codes so “an exercise for the student” in that language. But I can read C fine, so the example was easy to follow. (Most “curly brace” languages are more alike than different, so be it Perl or C or “whatever” it is easy to read if you have read any one; it’s the writing where the niggling differences show up…)

I did have one Ah-Ha moment in the reading. Looking at some C and thinking WT…? There was an issue of multiple threads executing and a potential for a variable to change after you evaluated it but before you actually did work on it.

Realize this example is AFTER a couple of iterations of complexity, so really he works up to it slowly. Don’t be put off by it as too complicated for a beginner as he doesn’t begin here. Note that we evaluate core vs bestcore then do it again. The comments explain the reason.

#pragma omp parallel for
for (i = 1; i < seekLength; i ++) 
    double corr;
    // Calculates correlation value for the mixing position corresponding to 'i'
    corr = calcCrossCorr(refPos + channels * i, pMidBuffer, norm);

    // heuristic rule to slightly favour values close to mid of the range
    double tmp = (double)(2 * i - seekLength) / (double)seekLength;
    corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));

    // Check if we’ve found new best correlation value
    if (corr > bestCorr) 
        // For optimal performance, enter critical section only in case 
        // that best value found. In such case repeat 'if' condition as it's 
        // possible that other parallel execution may have already updated the 
        // bestCorr value in the mean time
        #pragma omp critical
        if (corr > bestCorr)
            bestCorr = corr;
            bestOffs = i;

This is toward the end when we have already called for a distributed “for” loop with the “omp parallel for” pragma (compiler directive in a comment format) and it is demonstrating how to use another pragma, the “omp critical” one, to restrict execution of shared code to when it is needed and ONLY by one thread at a time.

So the first “if” just checks that we need to “go there” and execute that critical code at all. Corr is bigger than bestcorr so far. THEN we execute the critical chunk. BUT, we might have been waiting for another thread to update the values, so before we actually DO something, we need to check it again to see if the other thread changed it after our first check, but before we got our turn.

Relevance to Non-Programmers

Why is this relevant to folks who are NOT writing parallel code?

Well, it illustrates the new kind of bug you can get in parallel conversions. Lots of the climate codes are gong to be (or have been…) converted to parallel execution. It is a little bit of an unnatural act for a programmer to put the same test twice in a row. A “newbie” to OpenMP might very well not realize it is needed. I didn’t. So this kind of issue will lead to bugs that may or may not be caught in a complicated climate model where folks mostly just look at the output to gauge if it is working, and the output changes with each model run to some extent anyway.

One must learn to think in terms of many flying balls at once and that the order in which they land is important to track, and control. Forget to check “the same thing twice” and you can have different, and likely wrong, results.

In Conclusion

So I now know to look at the Makefiles to see what settings are made to compiler flags to call OpenMP (in codes like MPAS model) and I know to look for the #pragma omp statements in the code to see where it is used.

I also know that I need to learn to think in an added dimension. Parallel time. At a moment in time, I check a value, but at the different moment in time when I write a new value, the old value may have changed due to another thread. For that kind of code, the use of “critical” matters; but that brings with it the potential that any given thread only gets to run that bit of critical code AFTER some other thread finishes and AFTER the value checked may have changed (again). So an image of many threads running in async, and SOMETIMES needing to write a variable must also have an image of locking and queuing and then checking on state / variable changes prior to acting.

It’s a different way of picturing the code, the execution, and how the syntax works.

It’s also a different way of introducing pernicious bugs. Bugs hard to catch in code with a wandering non-deterministic output like climate models. Almost by definition, the output of those models is a bit chaotic. How would you spot a little more variation and chaos in codes that already run “never the same way twice”?


So I used MPICH 2 in my earlier tests. It looks like OpenMPI is also available on the Pi. I need to try it out too.

Open MPI represents the merger between three well-known MPI implementations:

FT-MPI from the University of Tennessee
LA-MPI from Los Alamos National Laboratory
LAM/MPI from Indiana University

with contributions from the PACX-MPI team at the University of Stuttgart. These four institutions comprise the founding members of the Open MPI development team.

That “consolidation” stage is underway…

MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain components that were developed by a US governmental organisation, and is available for most flavours of Unix-like OS (including Linux and Mac OS X).
MPICH derivatives

IBM (MPI for the Blue Gene series and, as one option, for x- and p-series clusters)
Cray (MPI for all Cray platforms)
SiCortex (MPI SiCortex)
Microsoft (MS-MPI)
Intel (Intel MPI)
Qlogic (MPICH2-PSM)
Myricom (MPICH2-MX)
Ohio State University (MVAPICH and MVAPICH2)
University of British Columbia (MPICH2/SCTP, and Fine-Grain MPI (FG-MPI) which adds support for coroutines)

But it isn’t completed just yet…

On my “todo” list now is to test MPICH vs OpenMPI and also to trial OpenMP for multithreaded FORTRAN on the Pi. In theory, a hybrid build with OpenMP inside each board but OpenMPI distributing blocks between boards would be best for my little cluster. It is also going to be very tricky to get that right and to know how to prove it.

Then there is that coarray Fortran option. I think it ought to be in the latest Debian / Devuan Fortran, but I’ve not tried it nor searched for confirmation.

The first open-source compiler which implemented coarrays as specified in the Fortran 2008 standard for Linux architectures is G95. Currently, GNU Fortran provides wide coverage of Fortran’s coarray features in single- and multi-image configuration (the latter based on the OpenCoarrays library).

Since GNU and G95 are both available on Debian, it ought to be there… but sometimes hard to do bits get left out of early ports; especially if someone decides it isn’t important to the target user community. ( i.e. “Nobody will need massively parallel FORTRAN on a home education toy system”… ) It is easy for the folks doing the work (often for free) to decide to leave some hard and largely unused bit of a port effort to ‘later” or “the next guy”…

So yet another Dig Here! to see if it really is there and working. YADH? Or is that “yada yada YADH”? ;-)

Subscribe to feed

Posted in GCM, Tech Bits | Tagged , , , , | 38 Comments

Tips – December 2017

About “Tips”:

While I’m mostly interested in things having to do with:

Computer stuff, especially small single board computers
Making money, usually via trading
Weather and climate (“Global Warming” & “Climate Change”)
Quakes, Volcanoes, and other Earth Sciences
Current economic and political events
(often as those last three have impact on money and climate things…)
And just about any ‘way cool’ interesting science or technology

If something else is interesting to you, put a “tip” here as you like.

If there is a current Hot Topic for active discussion, try one of the Weekly Occasional Open Discussion pages here:

You can also look at the list of “Categories” on the right hand side and get an idea of any other broad area of interest.

This ought not to be seen as a “limit” on what is “interesting”, more as a “focus list” with other things that are interesting being fair game as well.

The History:

Note that “pages” are the things reached from links on the top bar just under the pretty picture. “Postings” are reached from the listing along the right side of any given article (posting).

Since WordPress has decided that comments on Pages, like the Old Tips Pages, won’t show up in recent comments, it kind of breaks the value of it for me. In response, I shifted from a set of “pages” to a set of “postings”. As any given Tips Posting gets full, I’ll add a new one.

I have kept the same general format, with the T page (top bar) still pointing to both the archive of Tips Pages as well as the series of new Postings via a link to the TIPS category.

This is the next posting from prior Tips postings. Same idea, just a new set of space to put pointers to things of interest. The most immediately preceding Tips posting is:

The generic “T” parent page remains up top, where older copies of the various “Tips” pages can be found archived. The Tips category (see list at right) marks Tips postings for easy location.

Subscribe to feed

Posted in Uncategorized | Tagged | 29 Comments

W.O.O.D. – 5 Dec 2017

This is another of the W.O.O.D. series of semi-regular
Weekly Occasional Open Discussions.
(i.e. if I forget and skip one, no big)

Immediate prior one here:
and remains open for threads running there
(at least until the ‘several month’ auto-close of comments on stale threads).

Canonical list of old ones here:

So use “Tips” for “Oooh, look at the interesting ponder thing!”
and “W.O.O.D” for “Did you see what just happened?! What did you think about it?”

For this week, I’m deep in ponder over Climate Models and Parallel Computing. I note in passing the CF of mutual “Investigations” (read “Witch Hunts”) in Washington DC and the persecution of anyone with a natural level of sexual “motivation” continues in full flush. Can’t say I have any sympathy for any of them. Swamp rats all, IMHO. Better they are grabbing each other by the “private parts” than going after us.

On local broadcast TV I’ve found CGTN – China Global Television Network. So now I have Yet Another Alternative Bias to offset our native stations suckage, biases, and Patrons. They stress commerce and economic advantages more than the crap ours covers. While the US TV had one of the Dims (old sourpuss woman from San Francisco who’s name escapes me at the moment – used to hang out with Babs Boxer…) opining that This Time For Sure they would get Trump on the Russian Deal with some kind of process crime (due to his tweets constituting “obstruction of justice”…) the China coverage at the moment from: is saying:

Trump to recognize Jerusalem as Israel’s capital

Senior White House officials have confirmed that US President Donald Trump will formally recognize Jerusalem as the capital of Israel in a speech at 18:00 GMT on Wednesday.

So literally “Tomorrow’s News Today!” ;-)

Though one must again ask: IF RT must register as a “Foreign Agent” then why not CGTN and Al Jazeera and the BBC and DW and…

Oh Well. I can still sift all of them for some hint what our “Leaders” are really doing (as opposed to the Democrat Cheering Section in the DC Pissing Match that is the domestic Network News…)

Subscribe to feed

Posted in W.O.O.D. | Tagged | 20 Comments

I Could Not Agree More

Subscribe to feed

Posted in News Related, Political Current Events | Tagged , , | 16 Comments