MPICH2 Installed on Pi Cluster

This conversation begins in “comments” on a prior thread where I posted my first results after finally getting this to work. See here:

https://chiefio.wordpress.com/2017/01/09/cluster-distcc-first-fire/#comment-77561

Which starts at the “get NFS working right” step (before that is celebration of getting distcc going and after it is the rest of the testing of MPI and the program used). There is also a comment about my “issues” with getting ssh to like me and the need to install ‘keyrings’ and then to activate them and save the result on each of the machines participating in the MPICH party.

OK, so what did I do and where did I get the formula?

First off, the internet is a revolution in technical workload. Where, in the past, something might take 4 days of reading manual pages and trying things, it now is often a web search, find a ‘close’ example, try it and proceed directly to debugging.

In this case, I modeled off of an Ubuntu MPICH install / test. Ubuntu is based on Debian that until Jessie was essentially the same as Devuan in not having SystemD, so many old “How To” pages still have the old systemV init scripts and methods in them. (So “everything you know is wrong” under systemD becomes “everything older than a month or two ago is still right” under Devuan.)

Here is the page on which I modeled my installation:

https://help.ubuntu.com/community/MpichCluster

It goes through several steps I’d already done, so I was “good to go” with skipping things like putting host names into the host file (though I did tune it up a bit) and things like creating a set of equivalent machines with equivalent logins.

Since, as SystemD propagates into more of the world, and Ubuntu changes how things are done, this page just might get a re-write to systemd crap, I’m going to quote heavily from it to preserved the Devuan Friendly content.

The basic interesting ‘trick’ they use is to have the common user id / login name share an nfs mounted home directory across all the systems. You can configure it with separate file systems and separate home directories, but a lot of copying will need to be done ;-)

So I’d set up ‘gcm’ as the user id on my headless nodes. I’d also prior to that created a different user account that took the 1001 UID on the headend board, but I’d not used that ID for anything yet. I just went into /etc/hosts and changed the name of it to ‘gcm’. Ditto in /etc/group. BTW, you can’t use a video monitor unless you are in group video in /etc/group, so I just added that name to anywhere that ‘pi’ was listed. Probably overkill, but whatever…

In /etc/group:

video:x:44:pi,gcm
sasl:x:45:
plugdev:x:46:pi,gcm
staff:x:50:
games:x:60:pi,gcm
users:x:100:pi,gcm
nogroup:x:65534:
input:x:101:pi,gcm

and anywhere else “pi” is listed as an added member of the group. For /etc/passwd this gets created when you do an “adduser gcm”, but here’s the line anyway:

gcm:x:1001:1001:Global Circulation Model,1,666-6666,666-666-6666:/Climate/home/gcm:/bin/bash

All the 666s are my answers to questions about the work phone number, etc. etc…

I used the /Climate file system on the headend machine for the home directory, and NFS exported it; that took a while to remind me to do the:

service rpcbind start

/etc/init.d/nfs-kernel-server start

It is “just wrong” that exportfs reports the file systems being exported when they are not until you do those steps… one site speculates it is a link ordering thing… who knows. Another had worse problems with it under systemD… and more exotic fix.

So accounts built on all three machines (I just let it build the home directory in /home on the headless units, so I can login with that directory if desired, then edited their /etc/passwd files to change /home/gcm to /Climate/home/gcm, and on the headend, copied the pristine home directory into /Climate as the ‘real’ copy).

I used:

mkdir /Climate/home /Climate/home/gcm

(cd /home/gcm; tar cf - . ) | (cd /Climate/home/gcm; tar xvf -)

That’s old school from when cp was daft. Now I think “cd /home; cp -r gcm /Climate/home/” would do it… though you might need to set a ‘keep ownership and permissions’ flag…

The common exported file system is built, exported, mounted, restarted, and tested.

The three machines are all in /etc/hosts. We pick it up there:

Setting Up an MPICH2 Cluster in Ubuntu

This guide describes how to build a simple MPICH cluster in ubuntu.

To understand the guide, a basic knowledge of command line usage and the principle mpich & clustering is assumed.

Here we have 4 nodes running Ubuntu server with these host names: ub0,ub1,ub2,ub3;

1. Defining hostnames in etc/hosts/

I didn’t use their names, so here’s my /etc/hosts instead. It doesn’t matter what names you use, as long as they are consistent.

gcm@Headless1:~ $  cat /etc/hosts
127.0.0.1	localhost, Headless1
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

127.0.1.1	raspberrypi
10.168.168.40	Headend, headend
10.168.168.41	Headless1, headless1
10.168.168.42	Headless2, headless2
10.186.168.43	Headless3, headless3

They make a big deal out of NOT having “Headless1” in the /etc/hosts file as part of the loopback interface (127.0.0.1), but looks like I accidentally left it there on one of them.

Here’s the one from the headend machine:

127.0.0.1       localhost
::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

#127.0.1.1      raspberrypi

10.168.168.40     Headend, headend
10.168.168.41     Headless1, headless1
10.168.168.42     Headless2, headless2
10.168.168.43     Headless3, headless3                           

I’ve not yet built headless3, but put the marker in there anyway… Maybe this weekend…

You can see where I got rid of names other than localhost on 127.0.0.1 in the last one. Maybe it matters ( I did have a bit of oddness with some things…) or maybe they are excessive. A tiny ‘dig here maybe’ for someone for the future.

Then they have you install NFS. Well, I’ve already got that in my build script and besides, it’s a different (slightly) command set.

2. Installing NFS

NFS allows us to create a folder on the master node and have it synced on all the other nodes. This folder can be used to store programs. To Install NFS just run this in the master node’s terminal:

On the headend with the disk (or whatever machine shares out the partition) you need to export the file system. Here’s my /etc/exports entry:

/Climate	10.168.168.0/24(rw,sync,no_root_squash,no_subtree_check)

After which you do an ‘exportfs -a’ or ‘exportfs -r’ to reexport if already running… and exportfs alone to show what is being exported.

root@Devuan:/# exportfs
/Climate      	10.168.168.0/24
root@Devuan:/# 

To mount it on the headless nodes, you need an entry in /etc/fstab like:

10.168.168.40:/Climate	/Climate	nfs	rw,defaults,nolock,auto,noatime

At this point you ought to be able to login as ‘gcm’ to any of the three boards and be in the same file system. The real one on the headend and the nfs mounted one on the headless. At that point I was “reminded” by an hour or two of wandering in the man page desert of the need to do that two command listed above. The linked web page says (where my headend is their ub0 and my headless1 is their ub1):

2. Installing NFS

NFS allows us to create a folder on the master node and have it synced on all the other nodes. This folder can be used to store programs. To Install NFS just run this in the master node’s terminal:

omid@ub0:~$ sudo apt-get install nfs-server

To install the client program on other nodes run this command on each of them:

omid@ub1:~$ sudo apt-get install nfs-client

Note: if you want to be more efficient in controlling several nodes using same commands, ClusterSSH is a nice tool and you can find a basic two-line tutorial here.

3. Sharing Master Folder

Make a folder in all nodes, we’ll store our data and programs in this folder.

omid@ub0:~$ sudo mkdir /mirror

And then we share the contents of this folder located on the master node to all the other nodes. In order to do this we first edit the /etc/exports file on the master node to contain the additional line

/mirror *(rw,sync)

This can be done using a text editor such as vim or by issuing this command:

omid@ub0:~$ echo “/mirror *(rw,sync)” | sudo tee -a /etc/exports

Now restart the nfs service on the master node to parse this configuration once again.

omid@ub0:~$ sudo service nfs-kernel-server restart

Note than we store out data and programs only in master node and other nodes will access them with NFS.

4. Mounting /master in nodes

Now all we need to do is to mount the folder on the other nodes. This can be done manually each time like this:

omid@ub1:~$ sudo mount ub0:/mirror /mirror
omid@ub2:~$ sudo mount ub0:/mirror /mirror
omid@ub3:~$ sudo mount ub0:/mirror /mirror

But it’s better to change fstab in order to mount it on every boot. We do this by editing /etc/fstab and adding this line:

ub0:/mirror /mirror nfs

and remounting all partitions by issuing this on all the slave nodes:

omid@ub1:~$ sudo mount -a
omid@ub2:~$ sudo mount -a
omid@ub3:~$ sudo mount -a

Well, the short form is: Get NFS working and share a common file system between the nodes… As I’ve talked about getting nfs working in the Debian install and now under Devuan cluster, well, let me know if you get stuck…

They define a user “mpiu”, but I’d already used “gcm” so stuck with it. They call their shared file system /mirror, I’d already set up /Climate.

5. Defining a user for running MPI programs

We define a user with same name and same userid in all nodes with a home directory in /mirror.

Here we name it “mpiu”! Also we change the owner of /mirror to mpiu:

omid@ub0:~$ sudo chown mpiu /mirror

After that, they install “OpenSSH”. Since ssh was already installed on Devuan, I skipped that step.

This next part got down into the weeds. You might remember that for ‘distcc’ I kludged around the need for keyrings with a link of the place machine ids are stored to /dev/null. Well, this uses all that security stuff, so here it is (just remember to think ‘gcm’ where they have ‘mpiu’ as the user):

7. Setting up passwordless SSH for communication between nodes

First we login with our new user to the master node:

omid@ub0:~$ su – mpiu

Then we generate an RSA key pair for mpiu:

mpiu@ub0:~$ ssh­-keygen ­-t rsa

You can keep the default ~/.ssh/id_rsa location. It is suggested to enter a strong passphrase for security reasons.

Next, we add this key to authorized keys:

mpiu@ub0:~$ cd .ssh
mpiu@ub0:~/.ssh$ cat id_rsa.pub >> authorized_keys

As the home directory of mpiu in all nodes is the same (/mirror/mpiu) , there is no need to run these commands on all nodes. If you didn’t mirror the home directory, though, you can use ssh-copy-id to copy a public key to another machine’s authorized_keys file safely.

To test SSH run:

mpiu@ub0:~$ ssh ub1 hostname

If you are asked to enter a passphrase every time, you need to set up a keychain. This is done easily by installing… Keychain.

mpiu@ub0:~$ sudo apt-get install keychain

And to tell it where your keys are and to start an ssh-agent automatically edit your ~/.bashrc file to contain the following lines (where id_rsa is the name of your private key file):

if type keychain >/dev/null 2>/dev/null; then
keychain –nogui -q id_rsa
[ -f ~/.keychain/${HOSTNAME}-sh ] && . ~/.keychain/${HOSTNAME}-sh
[ -f ~/.keychain/${HOSTNAME}-sh-gpg ] && . ~/.keychain/${HOSTNAME}-sh-gpg
fi

Exit and login once again or do a source ~/.bashrc for the changes to take effect.

Now your hostname via ssh command should return the other node’s hostname without asking for a password or a passphrase. Check that this works for all the slave nodes.

I didn’t at first install keychain on all the headless nodes. Didn’t work. A few hours chasing why… Eventually went back and did “apt-get install keychain” on all the nodes AND did an initializing “ssh headless1 hostname” to execute the ‘hostname’ command on headless1 and similarly “ssh headless2 hostname” on the headend, then the same thing on headless1 to headend and headless2 and again on headless2 to headend and headless1. That seemed to preload all the password checks on every machine to every machine and the permissions / login failures on running MPICH codes ceased.

They, then, have a step of installing the compilers, but I’ve already done that in my build script. So skipped it.

Finally we get to the actual install of MPICH2, which is curiously anticlimactic:

10. Installing MPICH2

Now the last ingredient we need installed on all the machines is the MPI implementation. You can install MPICH2 using Synaptic by typing:

sudo apt-get install mpich2

Alternatively, MPICH2 can be installed from source as explained in the MPICH installer guide or you can try using some other implementation such as OpenMPI.

To test that the program did indeed install successfully enter this on all the machines:

mpiu@ub0:~$ which mpiexec
mpiu@ub0:~$ which mpirun

Well, I could just add that to the build script… And no, you don’t need to build it from sources unless you want the experience…

The configuration of MPICH is nearly trivial. You list the machines in a file and reference it when you launch mpi. They use the name ‘machinefile’ which I dutifully copied not knowing if it was a ‘special’ name. It isn’t. I’m likely to change the name to something shorter, like maybe “cluster” or even “nodes”…

11. setting up a machinefile

Create a file called “machinefile” in mpiu’s home directory with node names followed by a colon and a number of processes to spawn:

ub3:4 # this will spawn 4 processes on ub3
ub2:2 # this will spawn 2 processes on ub2
ub1 # this will spawn 1 process on ub1
ub0 # this will spawn 1 process on ub0

My machine file is:

gcm@Headless2:/Climate/home/gcm# cat machinefile 
Headend:2
Headless1:4
Headless2:4

That is for now. I’m of course going to add a ‘Headless3’ when the Orange Pi gets integrated. Then I’ll tune things to see just how many cores of the headend I can use and not bog it down. For now, I’m setting it to one-per-cpu on the slave nodes and 1/2 of the cores on the Master node. Notice you can have different files for different tuning invoked at run time…

Their test program is already in the linked comments on the prior thread, but it is the standard “hello world” run on all things for their first debut… I modified it a bit. On running it, the thing was run and gone so fast that I couldn’t check it had actually distributed to the different nodes, despite having active “top” windows running on all of them. I stuck in a POSIX compliant ‘pause’ that I picked up from here:

http://stackoverflow.com/questions/4869507/how-to-pause-in-c down in a comment:

Under POSIX systems, the best solution seems to use:

#include

pause ();

If the process receives a signal whose effect is to terminate it (typically by typing Ctrl+C in the terminal), then pause will not return and the process will effectively be terminated by this signal. A more advanced usage is to use a signal-catching function, called when the corresponding signal is received, after which pause returns, resuming the process.

Note: using getchar() will not work is the standard input is redirected; hence this more general solution.

But in retrospect likely ought to have let it have a timer than ran out as in this comment:

If you want to just delay the closing of the window without having to actually press a button (getchar() method), you can simply use the sleep() method; it takes the amount of seconds you want to sleep as an argument.

#include
// your code here
sleep(3); // sleep for 3 seconds

References: sleep() manual

My current test case is:

root@Headless2:/Climate/home/gcm# cat mpi_hello.c
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>

int main(int argc, char** argv) {
    int myrank, nprocs;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);


    printf("Hello World! from Processor %d of %d\n", myrank, nprocs);
    pause ();

    MPI_Finalize();
    return 0;
}

You compile it, and hating to type anything long twice, I stuffed that in a mini-script of one line and made it executable with “chmod +x Makeit”

root@Headless2:/Climate/home/gcm# cat Makeit
mpicc mpi_hello.c -o mpi_hello

So to do the mpicc command to build the output executable of ‘mpi_hello’ I just type “./Makeit”. Lazy? Hell yes! All *nix is based on the idea that any single character you can compress out of your typing will save you months of time over your computing life…

I even have a ‘run script’ to make it go:

root@Headless2:/Climate/home/gcm# cat Doit
mpiexec -n 10 -f machinefile ./mpi_hello

Where “-n 10” says to make 10 copies and “-f machinefile” says to send them to the machines listed in ‘machinefile’ and the thing to run is in my present directory “./” and named “mpi_hello”.

Running it shows it works:

gcm@Devuan:~ $ ./Doit
Hello World! from Processor 0 of 10
Hello World! from Processor 1 of 10
Hello World! from Processor 6 of 10
Hello World! from Processor 7 of 10
Hello World! from Processor 8 of 10
Hello World! from Processor 9 of 10
Hello World! from Processor 2 of 10
Hello World! from Processor 3 of 10
Hello World! from Processor 4 of 10
Hello World! from Processor 5 of 10

But I’ve still got some ssh / permissions things to work out from headless2, which can accept jobs, but doesn’t want to send them out:

gcm@Headless2:~ $ ./Doit
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).

not exactly a big deal, since all jobs ought to originate from the Master node in my setup, but I would like to know “why”…

“Why? Don’t ask why. Down that path lies insanity and ruin. -E.M.Smith”

So of course I’m off to explore “why”… Did I miss a step on headless2? Is it asymmetrical somehow?

Thus are days lost…

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , . Bookmark the permalink.

4 Responses to MPICH2 Installed on Pi Cluster

  1. LG says:

    Step-by-Step it is …
    No one can say “EMSmith is not a generous soul” for he’s indeed a “GIVER”.

  2. E.M.Smith says:

    Ah, the ‘headless2’ permissions issue was that I’d not installed keychain on it (and then executed .bashrc again…) So that’s now fixed and “any node to any node” process launching is now working, even though I intend to primarily launch things from the headend and not the backend boards.

    @L.G.:

    Well, it’s just something in my nature… FWIW I’ve got a flock of about a dozen doves, most born under my patio awning, that I’m feeding this winter, along with almost that many young squirrels. I’m going through a 50 lb bag of sunflower seeds in about 2 weeks.

    Then again, at $2 / day it’s great entertainment ;-) and less that 1/2 of a Starbucks Mocha…

    If I see someone who need help, I help. If I see a problem that needs fixing, I fix it. Rinse and repeat…

    Oh, and you made the ‘mistake’ of expressing even a remote interest in something I like doing (computer stuff) so no way I’m letting that go by without pouncing on it ;-)

    @All:

    Just recovered from my 2nd power failure in about as many weeks. When the system came back up, everything was fine, and with ‘zero configuration’ it all just worked on launching an mpi based program. Nice, very nice.

    Still, I think I’m going to build that UPS sooner rather than later. At this rate, California electric power will be at 3rd world reliability by the end of winter…

    For a better test case, I wrapped the print command in a variable length sleep:

        sleep(myrank);
        printf("Hello World! from Processor %d of %d\n", myrank, nprocs);
        sleep(myrank);
    

    By doing it that way, and launching about 40 of them, I can watch each one finish in rank order and watch them in “top” on the different CPUs. Nice easy diagnostic that it’s all working.

  3. pinroot says:

    When I was in school, our computer lab had 20 or so computers, all running Fedora. They were configured so that /home/xxxx/ was actually located on a remote server. The nice thing about this was that you could log in from any of the 20+ machines and your files were ‘just there’. No need for /home for every user on every machine. You could do something similar with your setup, but since there’s only one user, it might not be worth it.

    OT but pi related:

    Opto22 is a company that makes industrial grade optoisolation equipment. We’ve used them at work. They now have a board that interfaces with the pi’s GPIO interface. You get 8 I/O points. They have a starter kit for $99 here:
    http://info.opto22.com/raspberry-pi-io?utm_campaign=Raspberry%20Pi%20I/O&utm_content=AWCustomEblast010917-RPiIO&utm_medium=AWCustomEblast&utm_source=email

    (Sorry for the long url).

  4. E.M.Smith says:

    @pinroot:

    Notice that /Climate is shared around the cluster and has the (common) home directory of ‘gcm’ on it. Same idea.

    BTW, often the ‘stuff’ after a ? in a URL is ‘tracking junk’ or ‘my computer config’ stuff that isn’t needed in a typical url. (Occasionally it is a number that is the actual page to display, and then some junk), This means you can try truncating it to get a ‘clean’ URL. For example: http://info.opto22.com/raspberry-pi-io

    Works just as well and with the “crap”…

    @All:

    Well, I decided to install the old ppmake (PVM {parallel virtual machine] parallel make or some such) onto the Pi. Found out that since distcc came along, most folks seem to have abandoned ppmake (and PVM), so it’s all lacking maintenance for a decade or so. Still there, but… I ran into things like it wants libpvm3 and what gets installed is libpvm3.4.5 or some such, and it can’t find pvm3.h and it wants libpvm3.a and libpvm3.so and can’t find them (where headless1 installed libpvm3.4.5.so but Headend didn’t install any…)

    All of it going to take hours to sort out, and I don’t care that much.

    It would be nice to be able to distribute FORTRAN compiles to the different nodes just like distcc distributes C compiles, but at this point it will likely take more time to make this ‘go’ than would be saved on Model E compiles… So I’m going to ‘let it be’ for a while and come back some {long time?} later to PVM / ppmake. It what I used about 15 years ago on my first Beowulf and it worked very well. But seems nearly abandoned now.

    Oh Well…

    I have a load of “family events” this week, which is why I’m being a bit absent on commenting and new articles. That, and the power failures and equipment (spouses computer died / needed recovery) and all have crimped my time. Just FYI in case anyone was wondering.

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s