Any User, One Line SystemD Crash

With a h/t to Dave Halliday here:

Interesting development in systemd-land:

We have this lovely gem from that link:

How to Crash Systemd in One Tweet

The following command, when run as any user, will crash systemd:

NOTIFY_SOCKET=/run/systemd/notify systemd-notify ""

After running this command, PID 1 is hung in the pause system call. You can no longer start and stop daemons. inetd-style services no longer accept connections. You cannot cleanly reboot the system.
The system feels generally unstable (e.g. ssh and su hang for 30 seconds since systemd is now integrated with the login system). All of this can be caused by a command that’s short enough to fit in a Tweet.

Edit (2016-09-28 21:34): Some people can only reproduce if they wrap the command in a while true loop. Yay non-determinism!

The bug is remarkably banal. The above systemd-notify command sends a zero-length message to the world-accessible UNIX domain socket located at /run/systemd/notify. PID 1 receives the message and fails an assertion that the message length is greater than zero. Despite the banality, the bug is serious, as it allows any local user to trivially perform a denial-of-service attack against a critical system component.

The immediate question raised by this bug is what kind of quality assurance process would allow such a simple bug to exist for over two years (it was introduced in systemd 209). Isn’t the empty string an obvious test case? One would hope that PID 1, the most important userspace process, would have better quality assurance than this. Unfortunately, it seems that crashes of PID 1 are not unusual, as a quick glance through the systemd commit log reveals commit messages such as:

* coredump: turn off coredump collection only when PID 1 crashes, not when journald crashes
* coredump: make sure to handle crashes of PID 1 and journald special
* coredump: turn off coredump collection entirely after journald or PID 1 crashed

Systemd’s problems run far deeper than this one bug. Systemd is defective by design. Writing bug-free software is extremely difficult. Even good programmers would inevitably introduce bugs into a project of the scale and complexity of systemd. However, good programmers recognize the difficulty of writing bug-free software and understand the importance of designing software in a way that minimizes the likelihood of bugs or at least reduces their impact. The systemd developers understand none of this, opting to cram an enormous amount of unnecessary complexity into PID 1, which runs as root and is written in a memory-unsafe language.

Some degree of complexity is to be expected, as systemd provides a number of useful and compelling features (although they did not invent them; they were just the first to aggressively market them). Whether or not systemd has made the right trade-off between features and complexity is a matter of debate. What is not debatable is that systemd’s complexity does not belong in PID 1. As Rich Felker explained, the only job of PID 1 is to execute the real init system and reap zombies. Furthermore, the real init system, even when running as a non-PID 1 process, should be structured in a modular way such that a failure in one of the riskier components does not bring down the more critical components. For instance, a failure in the daemon management code should not prevent the system from being cleanly rebooted.

In particular, any code that accepts messages from untrustworthy sources like systemd-notify should run in a dedicated process as a unprivileged user. The unprivileged process parses and validates messages before passing them along to the privileged process. This is called privilege separation and has been a best practice in security-aware software for over a decade. Systemd, by contrast, does text parsing on messages from untrusted sources, in C, running as root in PID 1. If you think systemd doesn’t need privilege separation because it only parses messages from local users, keep in mind that in the Internet era, local attacks tend to acquire remote vectors. Consider Shellshock, or the presentation at this year’s systemd conference which is titled “Talking to systemd from a Web Browser.”

The article then goes on to several other interesting tech bits. I’m leaving most of them for you to read in the link. This DNS one caught my eye. Why? Because DNS is absolutely critical to security and performance and is best run in a dedicated and very locked down secure way. If someone can pollute your DNS they can, at minimum, break system to system communications, at worst, direct you to their spoof system and get you to enter your credentials and steal all your permissions (which, if a priv user sys admin type means steal the citadel…)

Consider systemd’s DNS resolver. DNS is a complicated, security-sensitive protocol. In August 2014, Lennart Poettering declared that “systemd-resolved is now a pretty complete caching DNS and LLMNR stub resolver.” In reality, systemd-resolved failed to implement any of the documented best practices to protect against DNS cache poisoning. It was vulnerable to Dan Kaminsky’s cache poisoning attack which was fixed in every other DNS server during a massive coordinated response in 2008 (and which had been fixed in djbdns in 1999). Although systemd doesn’t force you to use systemd-resolved, it exposes a non-standard interface over DBUS which they encourage applications to use instead of the standard DNS protocol over port 53. If applications follow this recommendation, it will become impossible to replace systemd-resolved with a more secure DNS resolver, unless that DNS resolver opts to emulate systemd’s non-standard DBUS API.

And folks wonder why I hate SystemD with a passion… Designed wrong, implemented badly, doing things it ought not do, in ways that are broken. And that’s just at first glance… now we know that anyone can hang your system in a non-recoverable state and the DNS can be poisoned. Oh Joy. /sarc;

Guess it’s time for me to put a bit more effort into that non-systemd R.Pi source build project…

I can just see college kids world wide enjoying the fun of handing the campus servers, locking up their roomies machines, and generally creating mayhem. Heck, just put that line of text in a command name, oh, ‘dir’ to catch the Windows Rejects or ‘lsl’ for those who are used to having that aliased… put that in a normal command directory, then walk away… Over the course of weeks, at various time, it will trigger, the system go unstable and systemd lock up. Happy debugging…

Then there are the opportunities for causing grief in ANY critical system running on a systemd box. One trivial command insertion that does not need root privilege and your hospital email or medical records servers go down… or patient monitoring gear if it is based on this.

For the tech types, more in the link.

Subscribe to feed

About E.M.Smith

A technical managerial sort interested in things from Stonehenge to computer science. My present "hot buttons' are the mythology of Climate Change and ancient metrology; but things change...
This entry was posted in Tech Bits. Bookmark the permalink.

15 Responses to Any User, One Line SystemD Crash

  1. Larry Ledwick says:

    Oh that is just dandy!
    Could you block that using an alias or something to redirect that command to /dev/null or alias to an echo command “command not allowed!”?

  2. E.M.Smith says:


    Depends on what “notify_socket” is used for. Aliasing it away means you can’t do that thing.

    Personally, I’d put a wrapper on it (assuming that is possible) where you have a new “notify_socket” that checks for null string and quits, or forwards the result to the real notify_socket if format and length are good enough.

    I’m sure eventually it will even be fixed in the production version….

    But the problem is that this ought to never have happened in the first place and is only possible because the design is broken at a fundamental level (“Big Ball Of Opaque Wax” instead of “modular each part doing one thing well – and clearly”.) Now everyone has to scramble to keep their systems up, hope they stay secure, and do unplanned updates to core OS functions (massive quantities… as it is ONE BIG BALL OF WAX) without proper regression testing of the update. Because the design is bogus… so can’t just exhaustively test one little module…

    It just screams “Crappy Process and Crappy Design on Crappy Implementation”…

    I wan’t my BSD back …. Robust, secure, highly modular, efficient, clear and transparent operation… extraordinarily well designed and well tested… Frankly, if Linux stays with major parts committed to systemd land, and no good alternatives replace Red Hat and Debian / Ubuntu as the big whales, I’m likely to just pack it in on Linux, say “Nice Try, but you broke it” and go back to BSD. It was always better anyway. (Especially the network stack and security, with a refusal to stuff neat new stuff in until it was very well proven secure and stable.)

  3. Larry Ledwick says:

    Looking around I found this:
    This is mostly just a wrapper around sd_notify() and makes this functionality available to shell scripts.

    That implies you could change the name of sd_notify() and create a bug fix called sd_notify() that after checking for that null value called the real sd_notify()
    At least that sounds like it might work. Sort of a kludge fix for something that shouldn’t happen in the first place but better than leaving your system wide open.

    Much like many places alias rm = rm -i to prevent folks from being too stupid.

  4. Another Ian says:


    The South Australian power grid of operating systems?

    There has just been a big update of W10 which also has the look of a conumental mockup.

    I have just discovered that OO won’t print directly and I can’t see a heap of directories that are there

  5. E.M.Smith says:

    I also can’t shake the suspicion that systemd puts a bunch of security critical stuff in one very hard to inspect place in a horridly complex way that could be exploited by TLAs…especially were it done at their request and to their direction. We know they squeeze lever and pervert major vendors, and Red Hat is the big dog in this market. Flip them, the others follow, you get penetration of all those non compliant Linux providers… It is the place and the method I’d hit… So your security all comes down to trusting maybe a half dozen guys at Red Hat…

    @Another Ian:

    My last Windoz release was W7. I’ll not be using newer unless paid to do it, and the W7 is only in case I have old files or apps to open. You have my condolences on Windows 10…


    Yeah, ought to work. It will be fixed. But this particular bug is just a sentinel case. It is what it means that distresses. PID 1 privs to cockup a system exposed to everyone. No effective QA (hell, in my very first prodramming class we learned to expect dirty data and test for it, including null input. I ran SW QA for a while. Testing with null input is the FIRST thing you think of… at an application test, I walked up and started mashing keys, much to the programmers horror… as it crashed… never sit on a keyboard, I asked… This is just rooky stuff, not even smart defence against smart attacts.)

    This kind of thing is the reason unix is modular with small parts , so it is easy to have “secure by inspection”, and replaceable modules easily swapped to fix bugs, and limited scope of bugs potential, and privs local to only where needed, and isolation of scope of interactions or side effects, AND restart of hung modules by hand… and more.

    In classical Unix, a little hung notifier could just be killed and respawned. In systemd, it fouls up the whole system, requires a reboot to fix (shades of The Microsoft Way!) , AND prevents that orderly respawn or even a reboot. Just horrible design with worse consequences.

  6. anonymous says:

    I’ve heard that there has been a massive influx of m$ programmers into the Linux arena, the more I read about this fiasco called systemd, the more I believe it.

    Seems that the m$ way is being worked into a system that wasn’t broken in the first place. Now it is and in how many other aspects, yet to be discovered?

    Yeah, BSD is looking better by the minute…

  7. Gail Combs says:

    I am tossing this at Hubby.

    I lost the fight over the “Pay your bills ONLINE!” So the phone bill, the electric bill and the mortgage are now all paid on line from the account that gets automatic Social Security and pension payments…

  8. tom0mason says:

    Oh dear that means I’m done for then. Oh, for the old days of clean, straightforward, maintainable programming that was not buried in M$ style obfuscated coding methods.

    Was systemd just change for changes sake? What exactly was wrong with the init method?

    Like you say I’ll have to change again, hello BSD…

  9. Pain in the butt to embed links on an iPad while waiting for a build to finish…

    Linux 4.8 adds Pi, Surface support but Linus Torvalds fumes over ‘kernel-killing’ bug

  10. E.M.Smith says:


    The major real benefit is VERY rapid spin up of virtual machines. So a major commercial service farm, like cloud providers or giant multinationals, can have rapid service deployment on demand surges or system faults. (I managed disaster recovery tests in an environment like that… big real server farm “fails over” to a data center full of HP racks of VM boxes). When your booking engine cluster books $4 Million an hour, saying restart or failover is 10 seconds instead of 3 minutes matters. To the rest of us, not so much.

    IMHO, the real driver is a couple of people with ego needs and hubris thinking they have a better way (thus all the dbus communications stuff) and desiring to do “creative destruction” to show it…

    Side reason might be the great opportunity for TLAs to insert new backdoors and data leakage.. (if ALL your critical interprocess comms goes through one place, dbus, only one thing designed for broad communications needs be compromised to capture it alll..)

    I see no reason to choose between mutually reinforcing motivations.

    For the rest of us, it does nothing of merit.

    Oh, and the dbus framework makes some applications writting tasks a bit easier, so getting apps folks roped in worked, if slowly at first. There were other frameworks that also had similar benefits, so I find this a weak motivation.

  11. E.M.Smith says:

    Well, tested it on the recent ARCH Linux I’m running and it didn’t crash or hang. Guess the Arch guys are good at fixing things ;-) (Rolling release tends to be current…)

    I’ll need to try it on the first systemd Debian (wheezy?) and see what it does. Wonder if it might be hardware tied, so Pi not an issue? Hmmm…. I have a SystemD Centos for the Asus-64…

    “How to have fun crashing systems and calling it research” ;-)

  12. p.g.sharrow says:

    I noticed that the Rasbain that I loaded mentioned SystemD as being part of it…pg

  13. E.M.Smith says:


    Raspbian swapped as of Jessie, IIRC.


    Well, I spoke too soon on Arch being immune… At “shutdown” it didn’t want to put up the logout selector. In another terminal window where I was root, typed “halt” and return. Nothing.

    After a few more halt and click shutdown attempts, it finally popped up the “logout” panel, so I clicked “logout”…. Several minutes later of nothing happening, I pulled power…

    So “bug confirmed” under Arch, but it doesn’t immediately hang the system. I would guess only things that need the “notifier” hang initially, then, as they cramp up, things dependent on them start to hang…

    At any rate, as “Joe User” I can cause grief for me as “Systems Admin”… Not good.

    Very much double plus ungood…

Anything to say?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s