I’ve gotten a stack of Raspberry Pi boards to act as a cluster computer for compiling C programs.
In comments, LG asks a rather more deep question than it might, at first read, seem:
I’m looking for a link for step-by-step creation of a headless unit. Have you written anything that I have missed ?
So h/t to LG for that.
Often, I’ve had the same experience in searching for something. My keyword list starts off with MY biggest interest, then tapers down and varies most at the end. So “HEADLESS Debian cluster build…”
Works great, right up until your idea of the paradigm is divergent from most folks who are doing the work. It can take a while to realize you are on a paradigm raft, adrift at sea in the internet… I’ve done it many many times.
So this question first raises the issue of the paradigm. “Headless” implies a particular structure of computing. A Master node controlling a bunch of Slave nodes. (Well, now in a more P.C. world they are sometimes called “worker” nodes, as though being a ‘wage slave’ is better… but they don’t even get wages… just fed with electric power… but I digress…) There are other paradigms, and the reason looking for a “headless” node for distcc is that the fundamental paradigm of distcc is NOT Master / Slave, like a Beowulf Cluster. It is a COW – Cluster Of Workstations.
If you look at that Beowulf “HOW TO” (somewhat dated and Red Hat specific) you will note down near the bottom it lists “ssh” and “MPI” steps. That’s the magic sauce in a Beowulf. The basic hardware setup in a Beowulf is a Master (that I like to call a ‘headend’ sometimes) and it talks to your company wide ‘inside’ ethernet. It also has a second ethernet interface that talks to a Very Private to The Cluster and Very Fast ethernet switch. Then a collection of Slave nodes (“worker nodes” or “headless” nodes) also plug into that switch. They known nothing of the greater outside world, only their entirely isolated ethernet and Master node.
For a COW, all the stations are equal. They all connect directly to the general work network. They all know about each other and can even do things like reach out through a firewall to the outside world. Each of them can originates jobs, or work on jobs another hands out (‘farming’ jobs). Distcc is like that. Picture an Engineering department with 40 folks all working on a large software development project. For 2/3 of the day, each worker is not at their terminal and their station is “doing nothing”. For most of the 1/3 when they are at work, they are editing a file, in a meeting, at the coffee pot or bathroom, servicing the email queue, etc. etc. and their high performance workstation is doing nearly nothing. Using a COW structure, when any of them wants to compile their chunk of code, it gets farmed out to ALL the workstations. So even one guy working through lunch can effectively use all 40 workstations as needed. That’s the paradigm for ‘distcc’.
What I’m building toward is a Beowulf for Climate Models. I’m starting with ‘distcc’. So my paradigm looks somewhat like both. I’m talking about the nodes like they were Master and Slave, but they are built as peers.
Which leads to the ‘cheeky’ answers on how to build a headless unit:
“Unplug the monitor”…
What makes it a headless node?
Both very true, and not very helpful.
“For any question there is an answer that is absolutely correct, concise, and useless. -E.M.Smith”
(Someone else may have said that too, and maybe even before me, I’m not sure)
The simple answer needs a bit more elaboration…
What I Did
There are a few things I did to shift these nodes more toward “headless” and away from “COW”. First step was, as noted, I unplugged the monitor, keyboard, and mouse. Technically that’s all you really need to do to turn a COW node headless. Lop off the head. However…
You also need to be able to log in and do maintenance. I chose to use “ssh” for that as it is (maybe ‘was’ now that they made it ‘more secure’ and a PITA…) easier to set up and takes less resources than VNC. I could have installed ‘tightvnc’ (it is in my build script but commented out) and used it to ‘login’ instead. In fact, I’ve got a trivial example of a vnc driven ‘headless’ unit in the Dongle Pi setup.
Purpose: A remote graphical desktop interface to the target system. It lets you have a graphical desktop environment on the Puppet Pi via a screen / keyboard / mouse on your laptop.
For my initial build of the ‘headless’ nodes, I just plugged the keyboard, mouse, and monitor into the Pi directly. Once I had “ssh” working (or you could use VNC) then I didn’t need them plugged in and went back to using my workstation to further modify the headless nodes. (only a couple of times needing to move the KVM back as I blew it on something ;-)
Once you can effectively “get in” with a session from your remote workstation, you don’t need a full GUI running on the Pi, so that’s when I shut off lighdm, the desktop manager. No desktop, no manager needed.
At that point, you have a headless node. It just isn’t doing anything…
The real ‘magic sauce’ comes in the distcc configuration file. You can set this up to treat your computers as a COW of peers, or as a Master and Slaves. I’ll include one here:
root@Devuan:/Climate# cat /etc/default/distcc # Defaults for distcc initscript # sourced by /etc/init.d/distcc # # should distcc be started on boot? # STARTDISTCC="true" #STARTDISTCC="false" # # Which networks/hosts should be allowed to connect to the daemon? # You can list multiple hosts/networks separated by spaces. # Networks have to be in CIDR notation, f.e. 192.168.1.0/24 # Hosts are represented by a single IP Adress # # ALLOWEDNETS="127.0.0.1" ALLOWEDNETS="10.168.168.0/24" # # Which interface should distccd listen on? # You can specify a single interface, identified by it's IP address, here. # # LISTENER="127.0.0.1" LISTENER="10.168.168.40" # # You can specify a (positive) nice level for the distcc process here # # NICE="10" NICE="10" # # You can specify a maximum number of jobs, the server will accept concurrently # # JOBS="" JOBS="3" # # Enable Zeroconf support? # If enabled, distccd will register via mDNS/DNS-SD. # It can then automatically be found by zeroconf enabled distcc clients # without the need of a manually configured host list. # # ZEROCONF="true" ZEROCONF="false"
First off, the STARTDISTCC line. For a headless node that is always true. You want it to come up and start, no touching needed. For your personal workstation, you might want control and prefer a manual starting of distcc only when you don’t want to have 100% of your workstation for you. I chose to set it to ‘true’ on my headend anyway since I’m the only one using any of this hardware.
Then we have: ALLOWEDNETS=”10.168.168.0/24″
Here we tell the node where to look for distcc requests or where to put requests. You could have a Master with two interfaces, one outbound to 192.168.1.x that was NOT to be part of your Beowulf and one inbound to 10.168.16.x that was in the cluster. In that case, putting only the 10. address here forces all your distcc traffic into your Beowulf. Putting in the 192.168.1.x would let you be used by the entire internal network…
In my case, I’ve got my entire internal work network on that inside address block, so I’m acting like it is a COW and “any node is the same”. That is, for now, the Master Node only has one ethernet interface. At some future time, I’ll use the hardwire interface on the Pi M3 as the Cluster interface, and the WiFi wireless interface to connect out to the Rest Of World and it will become a Beowulf Master Node in full. At that time, the dedicated switch and Slave Nodes don’t get any internet connection (unless I let them route through the Pi Model 3 by turning on routing). Right now, all 3 nodes plug directly into the hardwire interface of my Netgear WiFi router. In the future I’ll plug them into a dedicated Netgear switch and the WiFi router becomes the “outside the Beowulf” network. In this way the COW becomes a Beowulf…
Next up, where do I pick up jobs?
This tells my workstation where to look for requests for a distcc job. For the worker nodes, it is their hardwire ethernet interface IP number. This, necessarily implies you set a fixed IP address. So in /etc/network/interfaces you need to do that.
root@Devuan:/Climate# cat /etc/network/interfaces # interfaces(5) file used by ifup(8) and ifdown(8) # Please note that this file is written to be used with dhcpcd # For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf' # Include files from /etc/network/interfaces.d: source-directory /etc/network/interfaces.d auto lo iface lo inet loopback #iface eth0 inet manual allow-hotplug wlan0 iface wlan0 inet manual wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf allow-hotplug wlan1 iface wlan1 inet manual wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf auto eth0 allow-hotplug eth0 iface eth0 inet static address 10.168.168.40 netmask 255.255.255.0 gateway 10.168.168.254 dns-domain chiefio.home dns-nameservers 127.0.0.1 220.127.116.11 10.168.168.254 18.104.22.168
It was at this point where I’d set it to have that hard coded IP number and found I was still getting a DHCP address too that I discovered someone had improved it by splitting out the dhcp setup to a new place, so had to manually shut off the ‘new thing’ to get only one IP address. (What I did then, in chronological order, here)
Now this matters as it is where you determine what network is “inside” a Beowulf and what is outside. Your Master node has two networks, your slaves only one. Your COW, only one, but hard coded values.
Now I believe but have not tested, that you can set that ‘listener’ to loopback and still hand out jobs to Slave nodes, but not listen for any for yourself, enforcing the Master role. As there are other more flexible ways to set this, I haven’t tried them all. This may be part of why my workstation was getting a few more than I thought it ought to get…
Who Owns My Computes?
The ‘nice’ value sets how much a job assignment can interrupt what you are doing. Nice of 20 doesn’t run until absolutely everything else has run. Nice of 1 is going to push you out of the way some times to get some decent work done. This config file has:
so will get some work done, but not give you very much grief on what you are doing. It will be deferential to your desktop work, but still not make the guy at the other end wait forever.
On a Master node, you could set this to, say 15 or 16, while on a Slave node, maybe a 1, or leave it out altogether as the only thing it ought to be doing is distcc work.
Then there is an absolute limit on the number of distcc jobs this node is to accept. A ‘reasonable’ value is 2 x cores (as sometimes a given job is waiting on I/O or finishes before the next one shows up). Here:
I set it to 3 on my headend node. It isn’t a pure Master, where you might make it none, nor is it a slave node where you want it slammed. On the Slave nodes, I set this to 8 as they have 4 cores each and need to be doing this and nothing but this. The headend also does the job setup, the linking, and any other non-distributed process. I likely need to set it to 2 or 1, especially as the cluster grows and the setup work dominates, for the Master node.
There is a sort of a ‘self discovery’ mode for distcc that ZEROCONF turns on. I’m not familiar with it and I didn’t feel like taking on that bit of learning work at the moment, but for a very large cluster it looks like you can skip the hardcoded IP addresses and host list. Just put them in one subnet and let them find each other. That’s more a COW behaviour than a Beowulf, so wasn’t really calling my name…
Now I’d mentioned that there were a couple of places where you can configure the workload sharing. One is the “.bashrc” file in your home directory (on whatever machine is your workstation). Here’s an excerpt from the bottom of mine (dated 6 May 2016 as that was my first distcc test. So for all you folks being impressed by my distcc skill, remember I’m only 8 months ahead of you. Oh, and remember the Consultants Creed:
“An expert is the guy one page ahead of you in the manual.”
#added by EMS 6May2016 export PATH=/distcc:$PATH # The remote machines that will build things for you. Don't put the ip of the Pi unless # you want the Pi to take part to the build process. # The syntax is : "IP_ADDRESS/NUMBER_OF_JOBS IP_ADDRESS/NUMBER_OF_JOBS" etc... # The documentation states that your should set the number of jobs per machine to # its number of processors. I advise you to set it to twice as much. See why in the test paragraph. # For example: #export DISTCC_HOSTS="localhost/3 10.168.168.41/8 10.168.168.42/8" export DISTCC_HOSTS="localhost/2 10.168.168.41/8 10.168.168.42/8" # When a job fails, distcc backs off the machine that failed for some time. # We want distcc to retry immediately export DISTCC_BACKOFF_PERIOD=0 # Time, in seconds, before distcc throws a DISTCC_IO_TIMEOUT error and tries to build the file # locally ( default hardcoded to 300 in version prior to 3.2 ) export DISTCC_IO_TIMEOUT=3000 # # To prevent local try of compile, uncomment this line: # #export DISTCC_SKIP_LOCAL_RETRY=1 # Don't try to build the file locally when a remote job failed export DISTCC_SKIP_LOCAL_RETRY=10
So you have some more tuning opportunties, especially on your headend machine.
Now I think my setting JOBS=3 for my headend and maybe having RETRY set a bit odd might be why I get the workload on the headend that I get, so I may need to tune this a bit and / or read up on just what the settings are for DISTCC_SKIP_LOCAL_RETRY and what they really mean in detail. Or perhaps JOBS=3 is having me ‘listen to myself’ on the network for 3 and listen to the loopback interface for 2 more here with “localhost/2”. Yes, that kind of thing can happen depending on how the program was written. “Tuning, here we come”.
So that’s the high level look (really, it is high level…) at what makes a headless node headless and how to do it.
I’m going to write up a ‘step by step’ through the process, but I think this kind of perspective posting is important before you just get down grinding through the weeds…
Oh, and FWIW, there are other kinds of clusters too. Giant Supercomputers that are a very tight cluster of 2000+ CPU/system boards on a high speed backplane (yet under the skin the concepts are the same for much of it). SETI and similar BOINC things spread over the internet with compute nodes of all sorts of different desktops at the other far end. Called Grid Computing, it differs from a COW mostly in that the machines are different architectures and spread over a slow internet instead of a fast corporate ethernet or a dedicated switched network (or an internal backplane fabric).
Don’t let all the fancy and divergent names fool you. Under it all, they are basically the same structure. Nodes that farm out work (or sometimes limited to only one node that can farm out work). Nodes that do the work. A network to connect them. As the network speed drops, you move from Parallel Processor Supercomputer to Beowulf to COW to GRID… and as the tightness of control of the headend drops from “complete” to “little” you follow the same order.
So don’t let the terminology and the “paradigms” intimidate. It’s just artificial complexity on tope of “a program on this computer hands some work to that computer over there”. Unplug the monitor, it’s a headless slave. Type on the keyboard with a monitor, it is the Master… or just a COW.