It took a couple of days but I’ve now completely rebuilt the distributed compiler cluster (“build monster”) on the Devuan 1.0 release.
Why this matters:
1) It is now running on a real official Devuan release post Q.A. not just a slapped together development version or an “uplifted” ‘whatever’ and hope all the moving parts align right.
2) It is all on a 4.x kernel now. So no issues from the incompatible “enhancement” of ext4 that caused older journals to be “improved” into a format that no longer worked on older releases. That wasn’t a Devuan issue. It first showed up in my shop on Armbian when I needed to run a 4.x kernel to get the OdroidXU4 to work right. Support for those newer cores and things… Plug in a disk, then it would not work back on the original machine. So, OK, I’ve done an out of cycle sudden upgrade of almost everything… Sigh. (Oddly, my big NFS Network File System server with 12 TB of disk on it is the one system still back on 3.x kernel. Then again, it’s disks never move between systems as they are NFS shared. That’s the Orange Pi One. “Someday” I need to both do the “uplift” to Devuan on it and advance the kernel; but for now it’s fine and I’m not touching it.)
3) It seems faster and more professional. I’ve only been using it a couple of days, but things just seem faster. Changing settings in FireFox had more of them “as I like them”. (Only had to shut off auto-update of search engines and change the search engine to DuckDuckgo. Spell checker already installed. Mostly security oriented settings by default.) The htop (performance monitor) CPU bars are more often moving in sync. Where before there was a tendency to “peg one core” it looks like now the work is multithreaded and spread over more cores. I’m typing this on the PiM3 and it is quite acceptable. The slightly annoying lag from before is not showing up nearly as much… I think that’s the 1 core vs 4 cores effect.
4) I was able to more precisely document the build steps. The prior distcc build was when I was trying to sort out what release of OS to use (as SystemD was trying to take over the world) and things were more muddled. Now I’ve got a nicely written log of steps and it worked first time out the gate. This posting will have my log of actions in it. (I didn’t write a build script as once I had the first chip made, I just cloned it and changed the hostname / IP and didn’t need to do more builds).
Sidebar on SD Cards
I went out to buy 3 x 8 GB mini-SD cards for the cluster. Turns out I could not find any. Not at Best Buy nor Walmart nor Walgreens nor Office Max. Everyone has moved up to 16 GB minimum on the rack. When doing whole SD card backups, that’s chewing up the card size of bits regardless of bits used.
I’d planned to make the thing on smaller chips (as there isn’t much space used, really. About 3 GB. Instead I’ve got 16 GB mini-SD cards in them. Oh Well. At some point I’m going to see if I can still order 4 or 8 GB Ultra speed mini-SD cards from Amazon. Looks like I need to lay in an inventory before the manufacture is totally halted.
In testing the cluster, I had an “issue” with it seeming to not work. I had made a test case of 8 copies of “Hello world” written in C and a Makefile to launch them all in one batch. I’d launch it. in about a second I’d have all the compiles done and the executables present. I had monitor windows up on all three nodes ( 12 cores ) and nothing showed up. What the?…
So I turned off “localhost” as a destination in the .bashrc settings. Still nothing.
Eventually I turned on logging and saw it was completing per the logs, and it looked like things were in fact being distributed out to the cluster nodes.
Finally I changed it so ALL 8 compiles were sent to just ONE node. Did the make, thought I saw a flicker but it was gone. I had been looking where I was typing. Stared directly at the monitor bars, and launched it again touch typing ( I had a scriptlet that removed the executables and .o files then did the make). There was about 1 second of the 4 usage bars of the 4 cores going to about the 1/2 way used point, then back to near zero. That was it. 8 compiles in about one second on just one board. No wonder I didn’t see anything using all 12 cores! We’re talking about 1/4 to 1/3 of a second of load then! And shorter than the refresh time of the monitor.
I think this thing is going to really rip on bigger jobs.
The Build Steps
Unlike my usual “build script” postings, this one is just a narrative. I’ll eventually put this into a script, but not right now.
I used the Raspberry Pi M2 image on both the M2 boards and the M3 board. All are now running identical code. The image is downloaded from here:
Then unpacked into an .img file with unxz. That image file is shoved onto a mini-SD card (which I mounted into a USB adapter) using the Pi via a dd command. Check VERY CAREFULLY that you get the right device letter when it is mounted, then unmount it and proceed:
dd bs=10M conv=fsync of=/dev/sdx if=devuan_jessie_1.0.0_armhf_raspi2.img
where X in /dev/sdx is the correct letter for your SD card device and if= whatever image you are using.
Then you get to wait a few minutes for the card to write.
The Devuan image does NOT auto-size to use the whole card. So at this point you have a 1.8 GB or so system on your 16 GB card and attempts to add more software to it will run out of space…
So I used ‘gparted’ to just resize the ext4 partition to use the rest of the card. It’s the easiest way to do it, fully graphical, and found under “preferences” on Debian / Devuan. (IF it isn’t installed, just do an “apt-get install gparted” on your build system). Now you can proceed to booting and configuring.
I swapped the card into the Pi M3 (taking out my “build builder” card) and booted. First up, change the root password. Second, I did an “adduser” to add my usual account for me. Then the usual bring it up to date via:
apt-get update apt-get upgrade
and wait a while… When done, I did a reboot even though it likely isn’t needed.
In one trial, I tried adding xorg and lxde first (for a graphical interface) and for unknown reasons that just gave me a blinking cursor in the upper left corner. Doing it after the other stuff was installed worked. I’m not sure why. In any case, doing “apt-get install” is line oriented anyway…
So, log in as root. Do a series of “apt-get install”, then reboot, then do “apt-get lxde” and all is good. The install of lxde looks to do xorg bits as a dependency so no need to explicitly do xorg.
What I installed, more or less in order of installation:
apt-get install scrot apt-get install dnsutils apt-get install build-essential apt-get install ntfs-3g apt-get install parallel apt-get install distcc apt-get install gfortran apt-get install distcc-pump apt-get install ccache apt-get install dmucs apt-get install htop apt-get install gparted apt-get install unionfs-fuse apt-get install squashfs-tools apt-get install hfsplus hfsutils
Now distcc-pump isn’t being used by me yet, and may not be needed for what I’m going anyway. I mostly just want to play with it to get familiar. It might help when doing full Linux system builds, but isn’t required. I’m using gfortran for various climate models, but if you are not doing FORTRAN, it’s not for you. Simlarly, ccache speeds up long large C system compilations by caching parts of the build for reuse. Likely overkill for my use but we’ll see.
Then “htop” is a great little system monitor with a process table listing and CPU load bars for each core. (It shows up under ‘system Tools’ menu). I can’t live without ‘gparted’ to adjust disk partitions, set file system labels, etc.
Then we get to file systems. I’m intending to use squashfs file systems to protect things like /usr from being changed. (We saw that in an earlier posting) So it, and unionfs, let you do interesting things with compressed and local file systems in a file. The HFS file system is what’s on the Mac. If you don’t intend to mount Mac disks on your system, it’s not needed. I’ve got a few Macs in the house (for the family members who don’t ‘do’ Linux…) so I sometimes have a Mac formatted disk to use.
At this point, I did another reboot. Again, not likely needed, but having been bit once with the blinky cursor after installing X, I didn’t want to be reset to zero again. I also used dd to make a backup of the chip into my “build builder” system so I could resume at this point without redoing it all.
dd bs=10M conv=fsync if=/dev/sdx of=Distcc_Devuan_1.0_NoX_12Nov2017
The “conv=fsync” likely is overkill, but forcing a disk sync is a good thing anyway ;-) Again, be sure you get the right drive letter for sdX and the “if” and “of” are pointing the right way! Output File (of) can be whatever you like. I name things with the use, the OS, release and date.
Ok, backup made, reboot with that chip.
At this point I did some more apt-get install steps, starting with LXDE which is my desktop of choice:
apt-get install lxde apt-get install firefox-esr apt-get install gimp apt-get install libreoffice
Not one of those is really needed on the distcc headless nodes. I tend to install them everywhere so that IFF I need to put a monitor and keyboard on one of them for some debugging, I've got all the usual tools and bit installed. It burns a bit of storage on the card and running a graphical UI uses a bit more memory, but there seems to be plenty anyway.
There are some things that need configuring. First up, edit the /etc/hostname. I used headless1 headless2 and headend for my three. Use whatever names you like. Then add them all in /etc/hosts. Here’s an example from the chip I’m using at the moment. (I’m trying out a Sony mini-SD Class 10 to see if it is noticeably slower than the Sandisk Ultra ones. It is a bit, but still fine.)
127.0.0.1 sonydevuan SonyDevuan devuan localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe02::1 ip6-allnodes fe02::1 ip6-allrouters 10.168.168.139 Opi opi orangepi 10.168.168.140 H0 h0 Headend headend 10.168.168.141 H1 h1 Headless1 headless1 10.168.168.142 H2 h2 Headless2 headless2 10.168.168.110 C1 c1 OdroidC1 odroidc1
I’m also going to install distcc et. al. onto the Odroid C1 and the Orange Pi as similar v7 core cluster nodes. As they are Armbian uplifted to Devuan, there might be some compatibility issues, but it ought to work. “We’ll see”… someday when I have enough distcc load to care ;-) Also note that I often trip over capitalization so I’ve put in names in all the likely capitalization combinations ;-)
Links to distcc are put in /usr/local/bin on this build. so:
ln -s /usr/bin/distcc /usr/local/bin/gcc ln -s /usr/bin/distcc /usr/local/bin/g++ ln -s /usr/bin/distcc /usr/local/bin/cc ln -s /usr/bin/distcc /usr/local/bin/c++ ln -s /usr/bin/distcc /usr/local/bin/cpp
Since /usr/local/bin already is ahead of /usr/bin in the default search path, no changes to the PATH variable were needed.
Then edit the .bashrc file in your home directory and add:
export DISTCC_HOSTS="localhost/4 10.168.168.41/8 10.168.168.42/8" export DISTCC_TO_TIMEOUT=3000
But use your IP numbers and the number after the / is how many jobs to send to that particular board.
One each of the headless nodes, I added “distccd” to the /etc/rc.local file.
#!/bin/sh -e # # rc.local # # This script is executed at the end of each multiuser runlevel. # Make sure that the script will "exit 0" on success or any other # value on error. # # In order to enable or disable this script just change the execution # bits. ## regen ssh keys on first boot [ -f /etc/ssh/ssh_host_rsa_key.pub ] || dpkg-reconfigure openssh-server distccd -a 10.168.168.0/24 --daemon exit 0
This launches a set of distccd daemons waiting for work from the given network addresses.
Oh, and as root I did a “service distcc start” to make it go on the headend machine. ( I did it on all of them, but it likely isn’t needed on the headless nodes as they have daemons running).
When I first ran it with the test case, I got an error message “//.discc” could not be created due to a permissions problem. I thought .discc was supposed to be created in your home directory (so ought not to be permissions issues), but on the off chance it was /.distcc I went ahead and did a mkdir /.discc and chowned it to distccd. I also rebooted so all changes would be in effect. One or the other thing fixed the error. “Someday” I’ll remove the /.distcc directory and see if that was it or not… As it now contains a directory named “state” I think it was needed ;-) There is likely a configuration setting to put this directory somewhere more reasonable (thus the two // that look like it wanted an ENV variable with a path name in the middle) so likely some configuration step I skipped.
It is all working. I’m really liking the Devuan 1.0 official release. It just has that quality feel of kindred spirits at the controls of the build process. It is compiled with various settings that make it work well, and with due attention to things like security settings even in FireFox (no default notification of crashes, for example). I suspect they took the time to assure more things are multi-core ready too.
I’ve got both a 64 bit arm64 and a 32 bit armhf build on chips for the Pi M3. The 32 bit runs great and I’ve not found any issues yet. The 64 bit seems a bit more sluggish. Not a surprise, really. The Pi has poor I/O structure and speed and now you are loading words that are 2 x as long for each instruction. It looks like they used hybrid (both arm64 and armhf instructions available) so it may just be the FireFox that was sluggish on 64 bit. More testing needed ;-) thus my build on this Sony chip to see how 32 bit does and on slower chips.
At this point, with the build cluster AND my major desktops all on Devuan 1.0, I’m happy with it and there’s no going back. The Odroid XU4 is still on Armbian with an “uplift” to Devuan (and has a couple of minor issues like the cursor not showing up until you open your first window… which is a challenge without a cursor to show you where to click…) so I’m waiting for a real XU4 build. They have a Devuan build for “xu” but I think it is only the 3, despite claims to the contrary. They use different chip sets and the octo core is ‘odd’.
IF I get really ambitious and try to build a Devuan from scratch, I may try it targeted at the XU4. But maybe after I’ve been successful with a Pi build ;-)
So, with that, hopefully I’ve not left out any steps nor gotten anything backwards in the remembering. I’ve not ‘retested’ this process via a scripted build so it could have errors. That is, it was a build once and remember, not a QA’d scripted rebuild, so YMMV…