More SystemD Follies

Here’s an interesting article by an experienced Systems Admin saying he is done with SystemD. I find the “honesty” in it refreshing ;-)

http://www.cathalferris.com/archives/1154

Systemd stupidity – or why Poettering is an idiot

Yes I work as a Unix admin, and I’m getting so sick of “systemd”, that complete ball of crap that Poettering has foisted upon the Linux community. His poor quality of work has singlehandedly caused the most problems I’ve had to face in my day-to-day work in the past year, through his shoddy code, his abysmal system design and his complete lack of knowledge on how to do things the Linux way.

Things that systemd breaks, in no particular order, and I’ll update this as I see them:

The use of binary log files breaks the simplicity of administration with text files.

systemd returns silent or incorrect return status from script starts. Nothing quite like a return status of “success” when the script actually failed.

Attempting to deprecate /etc/fstab by using its own mounting system, breaking a known-working system

Because systemd is so large and broken, it no longer has a small target area for any attacks on PID1. This is a major security issue.

systemd breaks the philosophy of “do one job, and do it well”, as it subscribes the to the known-broken monolithic philosophy as seen in Microsoft products, as it attempts to be a jack-of-all-trades and a master of none.

Poettering is also a poor responder to either bug reports or any warranted criticism of his project. There are many real bugs marked as wont-fix on the systemd bug tracker, and his comments on bugs are unprofessional and suggest that he has some issues to deal with, that should have no place in someone that manages a core server project.

I’ve seen systemd machines fail to reboot because systemd silently crashes on shutdown, with no console errors and no errors logged. That’s been fun to troubleshoot.

In short, systemd is utterly crap and is breaking Linux. There’s no surprise that there’s a groundswell of Linux admins that are working to remove systemd from their Linux distribution of choice. I choose to use Devuan, which is Debian that has been repaired by the removal of systemd.

See here for a great analysis of the situation.
Cathal
Site owner.

The “see here” points to:

http://ewontfix.com/14/

It is a bit long, but the guy includes some ideas on how to build a better init:

Broken by design: systemd
09 Feb 2014 19:56:09 GMT

Recently the topic of systemd has come up quite a bit in various communities in which I’m involved, including the musl IRC channel and on the Busybox mailing list.

While the attitude towards systemd in these communities is largely negative, much of what I’ve seen has been either dismissable by folks in different circles as mere conservatism, or tempered by an idea that despite its flaws, “the design is sound”. This latter view comes with the notion that systemd’s flaws are fixable without scrapping it or otherwise incurring major costs, and therefore not a major obstacle to adopting systemd.

My view is that this idea is wrong: systemd is broken by design, and despite offering highly enticing improvements over legacy init systems, it also brings major regressions in terms of many of the areas Linux is expected to excel: security, stability, and not having to reboot to upgrade your system.

The first big problem: PID 1

On unix systems, PID 1 is special. Orphaned processes (including a special case: daemons which orphan themselves) get reparented to PID 1. There are also some special signal semantics with respect to PID 1, and perhaps most importantly, if PID 1 crashes or exits, the whole system goes down (kernel panic).

Among the reasons systemd wants/needs to run as PID 1 is getting parenthood of badly-behaved daemons that orphan themselves, preventing their immediate parent from knowing their PID to signal or wait on them.

Unfortunately, it also gets the other properties, including bringing down the whole system when it crashes. This matters because systemd is complex. A lot more complex than traditional init systems. When I say complex, I don’t mean in a lines-of-code sense. I mean in terms of the possible inputs and code paths that may be activated at runtime. While legacy init systems basically deal with no inputs except SIGCHLD from orphaned processes exiting and manual runlevel changes performed by the administrator, systemd deals with all sorts of inputs, including device insertion and removal, changes to mount points and watched points in the filesystem, and even a public DBus-based API. These in turn entail resource allocation, file parsing, message parsing, string handling, and so on. This brings us to:

The second big problem: Attack Surface

On a hardened system without systemd, you have at most one root-privileged process with any exposed surface: sshd. Everything else is either running as unprivileged users or does not have any channel for providing it input except local input from root. Using systemd then more than doubles the attack surface.

This increased and unreasonable risk is not inherent to systemd’s goal of fixing legacy init. However it is inherent to the systemd design philosophy of putting everything into the init process.

The third big problem: Reboot to Upgrade

Windows Update rebooting

Fundamentally, upgrading should never require rebooting unless the component being upgraded is the kernel. Even then, for security updates, it’s ideal to have a “hot-patch” that can be applied as a loadable kernel module to mitigate the security issue until rebooting with the new kernel is appropriate.

Unfortunately, by moving large amounts of functionality that’s likely to need to be upgraded into PID 1, systemd makes it impossible to upgrade without rebooting. This leads to “Linux” becoming the laughing stock of Windows fans, as happened with Ubuntu a long time ago.

Possible counter-arguments

With regards to security, one could ask why can’t desktop systems use systemd, and leave server systems to find something else. But I think this line of reasoning is flawed in at least three ways:

Many of the selling-point features of systemd are server-oriented. State-of-the-art transaction-style handling of daemon starting and stopping is not a feature that’s useful on desktop systems. The intended audience for that sort of thing is clearly servers.

The desktop is quickly becoming irrelevant. The future platform is going to be mobile and is going to be dealing with the reality of running untrusted applications. While the desktop made the unix distinction of local user accounts largely irrelevant, the coming of mobile app ecosystems full of potentially-malicious apps makes “local security” more important than ever.

The crowd pushing systemd, possibly including its author, is not content to have systemd be one choice among many. By providing public APIs intended to be used by other applications, systemd has set itself up to be difficult not to use once it achieves a certain adoption threshold.

With regards to upgrades, systemd’s systemctl has a daemon-reexec command to make systemd serialize its state, re-exec itself, and continue uninterrupted. This could perhaps be used to switch to a new version without rebooting. Various programs already use this technique, such as the IRC client irssi which lets you /upgrade without dropping any connections. Unfortunately, this brings us back to the issue of PID 1 being special. For normal applications, if re-execing fails, the worst that happens is the process dies and gets restarted (either manually or by some monitoring process) if necessary. However for PID 1, if re-execing itself fails, the whole system goes down (kernel panic).

For common reasons it might fail, the execve syscall returns failure in the original process image, allowing the program to handle the error. However, failure of execve is not entirely atomic:

The kernel may fail setting up the VM for the new process image after the original VM has already been destroyed; the main situation under which this would happen is resource exhaustion.

Even after the kernel successfully sets up the new VM and transfers execution to the new process image, it’s possible to have failures prior to the transfer of control to the actual application program. This could happen in the dynamic linker (resource exhaustion or other transient failures mapping required libraries or loading configuration files) or libc startup code. Using musl libc with static linking or even dynamic linking with no additional libraries eliminates these failure cases, but systemd is intended to be used with glibc.

In addition, systemd might fail to restore its serialized state due to resource allocation failures, or if the old and new versions have diverged sufficiently that the old state is not usable by the new version.

So if not systemd, what? Debian’s discussion of whether to adopt systemd or not basically devolved into a false dichotomy between systemd and upstart. And except among grumpy old luddites, keeping legacy sysvinit is not an attractive option. So despite all its flaws, is systemd still the best option?

No.

None of the things systemd “does right” are at all revolutionary. They’ve been done many times before. DJB’s daemontools, runit, and Supervisor, among others, have solved the “legacy init is broken” problem over and over again (though each with some of their own flaws). Their failure to displace legacy sysvinit in major distributions had nothing to do with whether they solved the problem, and everything to do with marketing. Said differently, there’s nothing great and revolutionary about systemd. Its popularity is purely the result of an aggressive, dictatorial marketing strategy including elements such as:

Engulfing other “essential” system components like udev and making them difficult or impossible to use without systemd (but see eudev).

Setting up for API lock-in (having the DBus interfaces provided by systemd become a necessary API that user-level programs depend on).

Dictating policy rather than being scoped such that the user, administrator, or systems integrator (distribution) has to provide glue. This eliminates bikesheds and thereby fast-tracks adoption at the expense of flexibility and diversity.

So how should init be done right?

The Unix way: with simple self-contained programs that do one thing and do it well.

First, get everything out of PID 1:

The systemd way: Take advantage of special properties of pid 1 to the maximum extent possible. This leads to ever-expanding scope creep and exacerbates all of the problems described above (and probably many more yet to be discovered).

The right way: Do away with everything special about pid 1 by making pid 1 do nothing but start the real init script and then just reap zombies:

#define _XOPEN_SOURCE 700
#include 
#include 

int main()
{
    sigset_t set;
    int status;

    if (getpid() != 1) return 1;

    sigfillset(&set);
    sigprocmask(SIG_BLOCK, &set, 0);

    if (fork()) for (;;) wait(&status);

    sigprocmask(SIG_UNBLOCK, &set, 0);

    setsid();
    setpgid(0, 0);
    return execve("/etc/rc", (char *[]){ "rc", 0 }, (char *[]){ 0 });
}

Yes, that’s really all that belongs in PID 1. Then there’s no way it can fail at runtime, and no need to upgrade it once it’s successfully running.

Next, from the init script, run a process supervision system outside of PID 1 to manage daemons as immediate child processes (no backgrounding). As mentioned above are several existing choices here. It’s not clear to me that any of them are sufficiently polished or robust to satisfy major distributions at this time. But neither is systemd; its backers are just better at sweeping that under the rug.

What the existing choices do have, though, is better design, mainly in the way of having clean, well-defined scope rather than Katamari Damacy.

If none of them are ready for prime time, then the folks eager to replace legacy init in their favorite distributions need to step up and either polish one of the existing solutions or write a better implementation based on the same principles. Either of these options would be a lot less work than fixing what’s wrong with systemd.

Whatever system is chosen, the most important criterion is that it be transparent to applications. For 30+ years, the choice of init system used has been completely irrelevant to everybody but system integrators and administrators. User applications have had no reason to know or care whether you use sysvinit with runlevels, upstart, my minimal init with a hard-coded rc script or a more elaborate process-supervision system, or even /bin/sh. Ironically, this sort of modularity and interchangibility is what made systemd possible; if we were starting from the kind of monolithic, API-lock-in-oriented product systemd aims to be, swapping out the init system for something new and innovative would not even be an option.

Update: license on code

Added December 21, 2014.

There has been some interest in having a proper free software license on the trivial init code included above. I originally considered it too trivial to even care about copyright or need a license on it, but I don’t want this to keep anyone from using or reusing it, so I’m explicitly licensing it under the following terms (standard MIT license):

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Other than that, no problem ;-)

There are times you look at something and just have that feeling that it is best avoided. I had that feeling the first time I tried to deal with a SystemD based release. I’m very glad that I, early on, just walked away from it.

While it was a bit of a PITA early on, as there were not many choices and it required abandoning some “standards” that I’d already settle on, to use “new” code from a new release; I’ve become happier with the decision every time I look back.

Now, with Red Hat sold to Big Blue, the major proponent of SystemD will be undergoing assimilation trauma. (One hopes Pottering gets his just reward…) I’m pretty sure that, at a minimum, there will be a long pause in “development” at Red Hat as it gets IBMed.

It is very gratifying to be on the sidelines of this mess, but watching the Big Blue Fireworks at Red Hat will be fun ;-)

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits and tagged , , . Bookmark the permalink.

8 Responses to More SystemD Follies

  1. Larry Ledwick says:

    I’ve seen systemd machines fail to reboot because systemd silently crashes on shutdown, with no console errors and no errors logged. That’s been fun to troubleshoot.

    Hmmm that is interesting comment – I will have to pass that on to our sysadmins, we have been having an issue on all of our HPE Gen 10 servers running Centos 7 which are hanging on reoot, and have to be crashed by the sysadmin from the LILO interface to recover and bring the system back up after it goes comatose following a reboot command.

    Sounds suspiciously like that may be the same issue, if true that at least gives a clue on what to troubleshoot. Right now our solution is to have a sysadmin stand by to use a big stick on the system when the reboot hangs.

  2. E.M.Smith says:

    Glad that might help you… Your guys might try shutdown / restart and bypass ‘reboot’ as a command.. Won’t work for remote maintenance but then again, a crowbar to LILO doesn’t either…

    FWIW, the number of Devuan based releases is growing dramatically. There’s a lot more to choose from now (and I’m quite happy with the basic Devuan). “Your Guys” might want to pick one machine to convert over and assess.

    Most of these are directed at lightweight or older hardware, or specialized uses like systems recovery. Depending on what folks need some of the security hardened ones (heads) can be a good solution. For general purpose server use, I’d use generic Devuan.

    The major reason to show the list, below, is just how big it has gotten. It used to be 2 or 3 and that was it.

    https://devuan.org/os/partners/devuan-distros

    Derivatives Based on Devuan

    Devuan is conceived as a base OS on which to roll your own distro.

    This page documents Devuan-based projects.
    Available distributions

    EterTICs

    EterTICs GNU/Linux-Libre is a 100% libre distribution designed for Latin American radio environments. Previously based on Debian, It is the first libre distribution to be designed for this purpose.

    Exe GNU/Linux

    Exe GNU/Linux is a live CD or USB desktop orientated distribution pre-installed with the Trinity Desktop Environment. It can optionally be installed to a hard disk or USB thumb drive. The latest version of Exe GNU/Linux is based on Devuan rather than Debian.

    Gnuinos

    Gnuinos is a libre spin of Devuan GNU/Linux using the Openbox window manager. It is very lightweight, ships with the linux-libre kernel and is focused on general purpose computing. The project aims to produce a fully FSF approved distribution.

    MIYO

    Based on Devuan and using the Refracta tools MIYO (“make it your own”) is a minimalistic distribution that allows users to add software they want without the need to remove software they don’t want. It uses Openbox as the default window manager is ideal for older or resource limited computers.

    Nelum-Dev1

    Nelum-dev1 is a fast and lightweight live CD distribution created to showcase Devuan GNU/Linux. It comes in flavours for Openbox, MATE and XFCE.

    Refracta

    Refracta is a GNU/Linux distribution built with home users in mind. It provides a simple layout that will be comfortable for the majority of users. It also comes with special tools that allow the user to customize their system and create a live CD or live USB from their installation.

    Star

    Star is a do it yourself project for people that want to learn how to make their own live distribution. It is a live-build development environment aimed at helping the user create their own live distribution. There are versions available for both Debian and Devuan.

    heads

    Since “the amnesic incognito live system” (Tails) now ships with systemd, @parazyd thought it would be nice to make one without systemd.

    heads offers some improvements over tails. It is distributed with a fully de-blobbed kernel hardened with grsecurity and uses the awesome window manager.

    Like Tails, heads routes your traffic through the Tor network to anonymize your internet activity.

    good-life-linux

    Good Life Linux was made so that users with older hardware could install a minimal and base LXDE, Openbox, or Xfce system in order to make their system into what they need without any additional bloat to remove.

    Crowz

    Crowz (formerly Zephyr Linux) is a 64 bit live-hybrid distribution built on Debian and Devuan stable. It is a lightweight distribution with a small collection of applications included by default. There are flavours available for Fluxbox, JWM and Openbox each of which include a fully featured desktop experience.

    Dowse

    Dowse is a smart digital network appliance for home based local area networks (LAN) and also small and medium business offices, that makes it possible to connect objects and people in a friendly, conscious and responsible manner.

    DecodeOS

    DecodeOS is an ASCII-based derivative targeting micro-service usage on anonymous network clusters. It includes original software developed to automatically build p2p networks as Tor hidden service families.

    Maemo Leste

    Maemo Leste continues the legacy of Maemo by providing a free Maemo experience on mobile phones and tablets like the Nokia N900, Motorola Droid 4, Allwinner Tablets and more.

    crunkbong

    Crunkbong is a switchblade that does one thing really well. You won’t always need to use one. However, when the right time comes, you’ll be glad you had it. It’s fairly small and meant to be kept on hand for whenever a need arises.

    FluXuan

    FluXuan Linux is trying to bring your old computer laptop back to life. It is based on Devuan ASCII and follows Miyo Linux’ idea to leave the end user the power to install their favorite programs.

    Upcoming distributions

    Dyne:bolic

    Dyne:bolic is a fully libre live CD for media activists that can run on low-end machines.

    The “nesting” concept, which allows all modifications to the live cd to be saved to a USB drive and later reloaded on boot was adopted in the devuan-sdk.

    The first Dyne:bolic was created from LFS, and the last version was based on Debian Wheezy. The upcoming version will be based on Devuan to demonstrate its versatility.

  3. Larry Ledwick says:

    Been doing a bit of link chasing on that systemd hang on reboot, found some mentions on Debian from mid 2016 of similar behavior that was due to the order in which reboot handled some tasks, seems that shutting down swap while /tmp filesystem was mounted caused problems (apparently there was not enough memory available.

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788303
    Message #32 received at 788303@bugs.debian.org (full text, mbox, reply):

    From: Sascha Jung
    To: 788303@bugs.debian.org
    Subject: Re: systemd: Hangs indefinitely on >90% of reboot attempts
    Date: Wed, 6 Jan 2016 09:35:18 +0100

    I can confirm this behaviour on my system exactly as described.

    Besides the message

    swapoff: /dev/sdxx: swapoff failed: Cannot allocate memory

    there are also these lines:

    dev-sdxx.swap swap process exited, code=exited status=255
    Unit dev-sdxx.swap entered failed state.

    I also have set overcommit_memory=2 and in my case Commited_AS is
    above MemTotal when the system is running. But when the system got
    stuck during shutdown, I can see that this is not true anymore, i. e.
    the processes which commited memory were stopped and Committed_AS is
    well beyond MemTotal.
    Manually doing a ‘swapoff -a’ on the debug shell allows the system to
    continue with a proper shutdown or reboot.

    I “fixed” it somehow by removing the swap partition from /etc/fstab,
    creating an own systemd service for handling swap and adding a
    dependency for the process which needs to commit some memory:

    [Unit]
    Description=swap on /dev/sdxx
    Before=some.service

    [Service]
    User=root
    Group=root
    Type=oneshot
    RemainAfterExit=yes

    ExecStart=/sbin/swapon /dev/sdxx
    ExecStop=/sbin/swapoff /dev/sdxx

    [Install]
    WantedBy=multi-user.target

    I know that this is not a good solution, but it seems to work at least
    for my configuration.

    ============

    Message #126 received at 788303-close@bugs.debian.org (full text, mbox, reply):

    From: Martin Pitt
    To: 788303-close@bugs.debian.org
    Subject: Bug#788303: fixed in systemd 229-5
    Date: Mon, 25 Apr 2016 10:28:15 +0000

    Message #126 received at 788303-close@bugs.debian.org (full text, mbox, reply):

    From: Martin Pitt
    To: 788303-close@bugs.debian.org
    Subject: Bug#788303: fixed in systemd 229-5
    Date: Mon, 25 Apr 2016 10:28:15 +0000

    Source: systemd
    Source-Version: 229-5

    We believe that the bug you reported is fixed in the latest version of
    systemd, which is due to be installed in the Debian FTP archive.

    A summary of the changes between this version and the previous one is
    attached.

    Thank you for reporting the bug, which will now be closed. If you
    have further comments please address them to 788303@bugs.debian.org,
    and the maintainer will reopen the bug report if appropriate.

    Debian distribution maintenance software
    pp.
    Martin Pitt (supplier of updated systemd package)

    [fix] On shutdown, unmount /tmp before disabling swap. (Closes: #788303)

    This is a little beyond my normal skill level as I don’t ever deal with the systemd or debug commands, as all that falls in the scope of the sysadmins I just execute a simple :
    sudo reboot
    and expect the system to come backup eventually, if not I ping them to beat on it until it does, but this might be a useful hint to someone.

    I am also not sure how much crossover in behavior you would expect between Centos 7 and the Jessie version of Debian – might be the exact same bug or totally different.

  4. E.M.Smith says:

    @Larry L:

    It is exactly the kind of stuff I’ve had to deal with for decades. it’s the stuff that causes the “Experienced Systems Admin” Spidey Sense to go all tingly at the way SystemD does things and to complain about “not the Unix/ Linux way!”

    First, SystemD is just Too Damn Big. This shows up in complexity issues that can’t be unraveled to find the problem AND in memory issues at shutdown (as shown) and potentially at other times (like making sure the part that deals with swap isn’t swapped out…) Then it hides a lot of the most important information in binary files where you need a program to read them – not exactly useful if you have a crashed system and can’t run the program… where simple text files can be read with a dump to a device.

    Unix Way: “Do one small task and do it well” gives code you can just look at and know it is right.

    Then it tries to do so many things, many of them with critical dependencies, but in such a way that the particular path through those actions is not always known (or even knowable). So how to you test all possible commands and all possible command orderings sent to this monster?

    You can’t. So every release all the users become the Q.A. department – Just Like Microsoft.

    So, OK, maybe this time moving unmount /tmp ahead of disable swap fixes it. What about the next time something uses a bit more memory and isn’t shut down before swap is disabled? Hmmm?

    Basically SystemD brings with it a whole huge basket of new untestable failure modes and possibilities. The “old way” had more deliberate and specific ordering and lots of ability to dump diagnostics (in human readable form…) along the way. Yes, start up and shutdown took a few seconds, maybe even a minute, longer. I’ll take that over endless months standing around trying to figure out why reboot fails and why my printer doesn’t always show up and why some file systems are a bit odd in how / when they mount, or fail to mount, or hours spent decoding binary logs… etc. etc.

  5. Jay says:

    MX Linux is still systemd free, as the related AntiX Linux.
    Both based on debian stable.
    Still have lineage from the old Mepis, like the installer.

  6. Kneel says:

    ” Yes, start up and shutdown took a few seconds, maybe even a minute, longer.”
    Great.
    It needs to because it needs to reboot much more often…

  7. E.M.Smith says:

    @Kneel:

    Tee Hee! 8-)

    FWIW, the other day I noticed my R. Pi desktop had been up a few months. I just turn the monitor off when I’m not using it…

  8. Pingback: More SystemD Follies | Confessions of a Technophobe

Comments are closed.