BOINC-ing A Cluster

Got a little bit tired of having a stack of cores not doing anything. Now that it’s all configured as I like it, open a panel on each, and pop up an “htop” display. A nice load bar for each core. 16 of them just sitting there near or at zero…

Couldn’t stand the waste… so spent some time looking at how to set up a code running OpenCL on the GPUs. Found a recipe for doing it on the Odroid. It’s not easy, and filled with “issues”. Climate Models not ready to run yet (still digging though all the knobs to set…)

So I decided to go ahead and put “BOINC” on the cluster. At least it would be doing something worthwhile (sort of ;-) and I’d get my burn-in / acceptance test out of the way. Nothing like running a stack of cores at 100% for a few days to find out if it’s going to crash in the middle of a model run… or something else you care about.

https://boinc.berkeley.edu

Open-source software for volunteer computing

Use the idle time on your computer (Windows, Mac, Linux, or Android) to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. It’s safe, secure, and easy:

For Android devices, get the BOINC app from the Google Play Store; for Kindle, get it from the Amazon App Store.

You can choose to support projects such as Einstein@Home, IBM World Community Grid, and SETI@home, among many others. If you run several projects, try an account manager such as GridRepublic or BAM! .

I’ve run SETI @ Home, on and off, since sometime in the 80s. (Still no aliens, though… dang it.) It was the first one of these distributed things to make a splash. Then they generalized the process and you can now use the BOINC framework to run any of many different projects.

Not all of them run on an ARM chip, though. Interesting to note that now some of the PC oriented ones also will run in the GPUs on those boxes.

I just semi-randomly signed up for 3 projects. SETI @ Home, an Enigma crack (seems someone has 3 old Enigma encrypted messages from W.W.II that have never been decoded, so they are going for a crack of it), and an Asteroids program that is using known astro-data to fully describe all the asteroids they can (rotation etc.) At some point I ought to find out just what all works on the ARM chips and settle on those I think have the most benefit. For now it was just “what runs and looks at all fun?”.

Installing the application is trivial. It’s in the Debian build already. On the headless boards, do “apt-get install boinc-client”. On the master / headend station, do that and do “apt-get install boinc-manager”

https://boinc.berkeley.edu/wiki/Installing_BOINC_on_Debian

But then there’s some configuration bits to do… I’ve bolded the bit that was annoying for me:

If you do only the basic installation as described above, BOINC manager will not be able to automatically connect to the client. To connect the client you will be required to give the GUI RPC password every time you start BOINC manager. That is not a bug, it is a security feature to prevent other users from using the manager to manipulate the client, changing your projects, etc. Another inconvenience is that boinc (the user named boinc) owns /var/lib/boinc-client/ and all the files and directories in it so you will not be able to edit those files from your regular user account unless you add your username to the boinc group and adjust some permissions as follows, substituting your username for :

Open /etc/group in a text editor.
Look for the line starting with boinc:x::
Edit the line to look like boinc:x:: ( will be a number, do not change it)
Save the file and close the editor.
Open a terminal and enter the following commands, substitude your username for :
sudo ln -s /etc/boinc-client/gui_rpc_auth.cfg /home//gui_rpc_auth.cfg
sudo ln -s /etc/boinc-client/gui_rpc_auth.cfg /var/lib/boinc-client/gui_rpc_auth.cfg
sudo chown boinc:boinc /home//gui_rpc_auth.cfg
sudo chown boinc:boinc /var/lib/boinc-client/gui_rpc_auth.cfg
sudo chmod g+rw /var/lib/boinc-client
sudo chmod g+rw /var/lib/boinc-client/*.*

So I read that, and proceeded to rote do the commands listed, but missed the importance of it. There are files in that directory that must be changed.

The other “quirk” was that the BOINC manager lets you change what machine your are managing with a dropdown menu choice of changing computers, but then it prompts for “computer name” and “password”. Who’s password? On what machine?

So I tried my login name password. Nothing. I tried the boinc account. I changed the boinc account password (as I’d never set it so what was it?) and still no go. Eventually I found out that it has it’s own ‘special’ password. It also has a magic file where you must put the IP number of any managing workstation on each of the client headless boards.

https://boinc.berkeley.edu/wiki/Controlling_BOINC_remotely

Access control for GUI RPC

GUI RPCs are divided into two categories:

Status operations which return information about tasks, project, etc.
Control operations which change the state of BOINC (suspend/resume, add project, etc.).

Some GUI RPCs are authenticated with a GUI RPC password. This is stored in the file gui_rpc_auth.cfg in the BOINC data directory. On a multiuser computer, this should be protected against access by other users. When BOINC client first runs, it generates a random password. You can change it if you like; max length is 255 characters.

Local access

A “local” RPC is one that comes from the computer where the BOINC client is running (but perhaps from a different logged-in user).

Local status RPCs are not authenticated. On a multiuser computer, a user can see the status of any other user’s BOINC client.

Local control RPCs are authenticated using the GUI RPC password.

Remote access

A “remote” RPC is one that comes from a different computer.

All remote RPCs (both status and control) are authenticated using the GUI RPC password.

By default, remote RPCs are not accepted from any host. To specify a set of hosts from which RPCs are allowed, create a file remote_hosts.cfg in your BOINC data directory containing a list of allowed DNS host names
or IP addresses (one per line). Only these hosts will be able to connect. The remote_hosts.cfg file can also have comment lines that start with either a # or a ; character.

Now despite what it said, there was no default password in that .cfg file. So in fact, I had to just not put ANY password into the prompt to connect to another node. Just the computer name. HOWEVER, until the IP number of the management station was put in the remote_hosts.cfg file on the headless nodes, it would not allow the connection…

Once It Works

Then you get to add “projects”. These want an email account and a new password for the project login on the web page. Then you get the software for that project loaded and it starts.

The Management Station lets you allocate how many cores, and what percentage of CPU, and what times of day, and… lots of other controls. The tasks for a given project run niced to very low priority, so generally get out of the way of other use; but on a Pi if it is also your desktop, you will want to limit the use to 50% or 75% of cores, just so you don’t have to wait for a swap when starting to do something.

So as of now, I’ve got 16 cores running full boogie on BOINC. I’ll be leaving the cluster running this way for a day or two, then asses things like stability and core temperatures. Also actual work done.

For now I’m just happy to see all the CPU load bars up there at near 100% showing something is being done ;-)

Subscribe to feed

Advertisements

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , , . Bookmark the permalink.

6 Responses to BOINC-ing A Cluster

  1. Rosetta is a fun and great project.

  2. larrygeiger says:

    Jerry Pournelle Lives!!

    (I like the title ChiefIO better than Chaos Manor but the feel is similar…)

  3. jim2 says:

    CIO – how do you make your cluster run programs in parallel? Does it require a certain language, some sort of manager, and do the programs have to be refactored to utilize the cluster?

  4. E.M.Smith says:

    @Dave:
    https://boinc.berkeley.edu/wiki/Rosetta@home

    Determine the 3-dimensional shapes of proteins in research that may ultimately lead to finding cures for some major human diseases. By running Rosetta@home you will help us speed up and extend our research in ways we couldn’t possibly attempt without your help. You will also be helping our efforts at designing new proteins to fight diseases such as HIV, Malaria, Cancer, and Alzheimer’s.

    Ah! THAT was the protein folding one! It was the one I wanted to devote cycles toward, but had forgotten the name and was in a hurry… so chose chasing space aliens instead ;-0

    I’ve set the “SETI” project to not accept new jobs and I’m adding this one as soon as cores come free.

    @Larry:

    BLUSH! Well, I try ;-)

    I used to read some of his stuff and was always interested in how he was both a Tech Guy and an SF Writer. Probably why I have this unscratched itch to write some SiFi…

    @All:

    Well, now I’ve gone and done it…

    I got tired of dealing with the HDMI / DVI adapter issues and went out to look at TVs (again). At Best Buy they were selling a house brand (Insignia) 22 inch (just right for desktop) TV with 1080p resolution for $80. Couldn’t pass it up.

    I’ve now got it on the desktop and the Pi M3 is hooked up to it.

    Issues: The only “issue” I’ve run into, that makes a “monitor” different from a TV, is that the TV does not tilt. I’d like to tilt it up just a bit so I was looking right at it, but it doesn’t. OK: Note to self – find something about 20 inches by 6 inches by 6 inches tall that I want on my desktop and put it under the monitor to raise it to eye height… Maybe a nice storage box thing I could put Pi Parts and stuff into…

    Other than that, it’s dandy.

    Yesterday was spent clearing the desk and re-doing the desktop with 2 monitors and 2 Pi / Odroid boards running on it at the same time (in addition to the stack).

    I’ve now got (almost…) the configuration I’d envisioned some months (year?) back.

    I’m presently listening to “Immortal” on the TV via the Pi M3, sound works!!!
    (Video frame rate is way too low to be usable, then again, I’ve got two cores doing BOINC right now… but eventually I’ll swap which computer is on the TV for one of the Odroids that has a 1080p frame rate).

    The Pi Stack is now fully configured and running (something at least) with the Alpine based DNS server / Pi B+ / squid server / router on the bottom; 2 x Pi M2 boards running Devuan and configured for “parallel”, distcc, and BOINC; the Odroid C1 (that also can drive the TV if I plug the cable into it – so video issues I’d had with it also due to the DVI – HDMI adapter) similarly configured but running Armbian with a Devuan uplift (“someday” a real Devuan would be nice); and then wire tied to the top is the Orange Pi (that doesn’t fit the mounting holes anyway) with the TBs of disk acting as file server / site scraper.

    All those boards, except the Pi B+ that’s near zero CPU use anyway, have a login open on the Pi M3 and have “htop” running it it, so I have an active performance monitor window to them showing cores in use at the moment.

    Headend with display, 3 boards of cluster stack, file server, DNS / proxy / etc. All integrated and running.

    On the other monitor I have the Odroid XU4 running as desktop for “outside stuff”. That station can do ‘experimental things” without disrupting the Cluster at all

    Open Issues:

    I had been running the Pi M3 with most file systems served from a hard disk. At the time I went to bring up both it and the Odroid at the same time, I realized I’d forgotten a couple of things… like USB Hub… I have one for the file server to support the disk farm and one on the desktop. Things were configured to need it for each board. Oops.

    So I converted the Pi M3 back to running all from the chip. (Unmount disk, copy back file systems as I’d added BOINC at the top level but not to the chip and needed to preserve work units / status, reboot). I also had to steal the Logitech wireless keyboard / trackpad from the Chromebox as I was short a USB Mouse.

    The XU4 doesn’t have enough working USB ports to support what I needed on it. It has no built in WiFi, so needs a WiFi dongle. (The hardwire ports are full with the Cluster stack and even then the Pi M3 is on WiFi as there’s 4 ports and 5 boards) It also needs a keyboard and mouse. Since the USB 3.0 ports are still dodgy, that leaves only the one USB 2.0 port, ergo it must have the USB Hub. Sigh. Oh, and I’d been running with my home directory from a USB Stick on it, too, so we’re up to 4 USB 2.0 needed and that’s before I plug in any hard disks for disk maintenance things…

    OK, if the XU4 takes the hub, then the Pi M3 needs to run without it… So I had to move off of the hard disk back to the mini-SD card in it. As that is an 8 GB Sony, it is a tight fit, but works.

    That only leaves One Last Thing:

    I need to swap which monitor is on what board… The XU4 and C2 can both drive nice frame rates and would run sound with video nicely on the TV, but I set things up for now on the old DVI monitor… where the Pi M3 would be fine… OK, I’m one reboot of the two boards away from that. I can live with that.

    So, the “To Do List”:

    Buy another Logitech mini-keyboard with trackpad OR clean my junk pile enough to find my other USB Mouse (mice?… I think I have a couple ‘somewhere’…)

    Buy another USB Hub so I can use multiple “stuff” on any board. Nice to have “someday” but I’m working OK without it for now.

    Finish the ‘tidy up’ of disks and systems after all this moving about. The XU4 won’t mount nfs disk for some reason ( I’d never tested it for that until now as it was for “Dirty Driver” and not intended to be mounting the file server – I briefly tried using NFS to avoid using the USB Stick, and it failed) so I need to decide if I care about nfs on it. Find where all I stuck copies of my home directory while trying different combos and “toss the trash”. That kind of stuff.

    Clean up my office. A year or so accumulation of “stuff” on the desktop was “set aside” and then the place ransacked a bit looking for mice and keyboards that are USB. I have a bunch of mice and keyboards, but many are old PS2 types for the archival White Box PCs… I think maybe it’s time to collect all the ancient stuff and “move on”… One USB Keyboard was tested, a couple of keys didn’t work (one of them being in my login name and password) so it already hit the garbage pile.

    So much work triggered by one monitor buy ;-)

    FWIW, I’ve put a folded dipole on it hung from the window shade. Gets about a dozen channels, most of them Spanish language… The area around here is a bit over 1/2 Spanish / Mexican speakers… I’ve also put one of the ROKU sticks on it. That one will shuttle between the bedroom and here depending on what I want to do where. I *might* pop another $25 for the cheap “optical remote” model just to avoid the moving back and forth and given that the desktop is very much on the optical route for me ;-) We’ll see how it goes. This does raise the interesting issue that I can watch TV, or use the monitor on that station, but not both at the same time.

    Last night I watched “Addams Family Values” while using the Pi M3 to set up and test things (and find out the XU4 wasn’t going to cut it without the hub…). Being able to pause the ROKU while swapping to check on a file system copy status was nice. I spent more hours with “butt in seat” rather than getting tired of being in the office and heading out to watch the TV… TV is now a ‘multi-tasking participant’…

    It was during that process I realized I really wanted the Pi M3 on the DVI monitor and the XU4 on the TV. Not only does it put the media player board on the display with sound, but it also has the “always on doing things” cluster head end where it can be monitored and used WHILE a TV show is running on the TV and without pausing. When using the XU4, it tends to be highly interactive (like typing postings and such) so not suited to TV time slicing…

    So that’s where I’m at today.

    I’ll likely pause BOINC on the headend and do the shutdown, monitor swap, reboot; then some of the Office Clean Up. After a few hours not finding a USB mouse, I’ll give up and go shopping ;-)

    I really like the sense of “completion” after soo many months. Sure, the Cluster isn’t doing climate stuff, but it is essentially “done”. I’ve got it working on something, and I can monitor it while doing other things on another station. I’ve also “fixed” the sound issue (and video failure from the C1 and OrangePi) with the real HDMI interface. Just plug the HDMI cable and mouse/kb into any of the boards now and I can log in directly (If needed, like recovering a failed boot). Easy 10 second thing instead of shutdown the stack and move it to the bedroom TV to recover.

    Sometime next week I’ll start in on getting the stack to do things more directly related to personal R&D directions. For now I’m just going to enjoy having built the tool I wanted.

    Sidebar on performance:

    Along the way we’ve learned a lot about SBC cheap board performance. IFF I need any more cores added, I can now make a better decision about what to add. On the “someday” list is another Dogbone Case with 4 x (some other board). But that’s way out in the future. Before that, I need to play with POCL and find out if I can get a GPU working with FORTRAN… and if so, which GPU… So I figure it will be about a year before I have that worked out. Figure “Christmas after next” ;-) Watching BOINC, the Pi M2 boards are clearly very slow compared to the Pi M3 and Odroids. I’m suspecting that the “someday Stack” will be Odroids with the GPU in play. But they don’t have a native Devuan port… So I’m going to look at specs on the Devuan directly ported boards and do some pondering. They have many Banana Pi and Orange pi candidates that might be good with big added heat sinks. “We’ll see” as time permits. For now I’ve got more performance than I’m able to effectively use, but if/when that changes, I’ll be doing a “data dive” into the Devuan supported board list.

  5. E.M.Smith says:

    @Jim2:

    Aye, now there’s the rub!

    Making things run in parallel is still a bit of a ‘black art’. There are a dozen ways to do it (likely more) and all of them a bit of a PITA and needing special tools or care. Most of them taking about a week to understand.

    First off, I just recently found “parallel” that is a command that lets you distribute some shell based workloads between systems. I’ve tested it, but don’t yet have any “production” process using it. (Now that I’ve got the cluster basically “done”, I’m going to be putting together scripts that do specific tasks distributed in that way).

    https://linux.die.net/man/1/parallel

    Next up, there’s PVM and OpenCL. PVM is fading in popularity, so I’m only looking at OpenCL at the moment. One version of OpenCL is called “pocl” and looks like the best one to investigate / learn. To use it, you ‘refactor’ your program (yes, you have to write the program to be parallel…) and put in OpenCL structures. (After installing OpenCL and configuring it across your system…)

    Within OpenCL, some boards and some GPUs can also be used… IFF you install the right ‘kit’ to make the GPU available for general computations… “Some assembly required”… I’m still learning about this bit of work…

    Then there’s another easy one. C compiles. As most developers spend a lot of time waiting for compiles of large system codes to complete, it’s no surprise that parallel compilation was one of the earliest done and easiest to learn on Linux / Unix / ARM… So the ‘distcc’ command and set up is what gets that done. I already have had this running for a while.

    https://linux.die.net/man/1/distcc

    Finally, there are “embarrassingly parallel problems” where “work units” can be split up and sent even over slow networks to a COW (Collection Of Workstations) even over the internet. Work units are reassembled at the controlling site when done. BOINC is the biggest / best known of these, but you can in fact “roll your own”. Take, for example, password brute force cracking. Split up the dictionary of 100,000 words into 1000 dictionaries of 100 words, and send the cryptext + dictionary fragment to 1000 machines and wait for one to send a “Got It!” message. (I’ve wondered if some of the BOINC stuff might not in fact be clandestine TLA work like that…)

    There are many other methods, systems, etc. etc. Heck, most compilers now will do ‘threading’ automatically for you (so one program turns into may ‘threads’ that run on multiple cores in your system). Plan9 has built in parallel workflow (though few people use it as an OS… yet…).

    I’m working on a posting per conversion of a climate model (FAMOUS) to parallel in large part (there’s a paper published on it and I would just be reviewing that paper). That’s a reasonable example of what it takes (and what I’m staring at with the climate models…) so “watch this space”.

    In summary, you get to do a lot of detailed technical work for anything really interesting, but by learning two commands and doing some easy configuration you can have parallel C compiles and shell scripts. Then using GPUs is a whole ‘nother thing with learning the CUDA language for NVIDEA cores or getting POCL to go with vendor “kits’ for other GPU cores like NEON. (That also brings with it the issue of IEE math vs 24 bit with “rounding via truncation” in many GPUs and potentially changed results…)

    So “happy digging”, and take a big shovel…

  6. jim2 says:

    Visual Studio C# has a pretty easy to use class, background worker, that let’s you run processes in “parallel” meaning separate threads. Of course that’s on one computer, not several.

Anything to say?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s