In my last post, Broken by design: systemd, I covered technical aspects of systemd outside its domain of specialization that make it a poor choice for the future of the Linux userspace's init system. Since then, it's come to my attention as a result of a thread on the glibc development list that systemd can't even get things right in its own problem domain: service supervision.
Per the manual, systemd has the following 6 "types" that can be used in a service file to control how systemd will supervise the service (daemon):
simple
- manages the lifetime of the daemon via the pid, and
depends on the daemon not forking so that it's a direct child
process. This roughly corresponds to the correct supervision
practices of other systems like runit, s6, etc. The service is
considered activated immediately.
forking
- assumes the original process invoked to start the daemon
will exit once the daemon is successfully initialized, and not
earlier. Requires a pid file and is subject to all the traditional
flaws of pid files (but systemd can mitigate them somewhat by being
the process that inherits orphans).
oneshot
- used for non-daemon "services" that run without forking
and exit when finished; systemd waits for them to exit before
considering them activated.
dbus
- like simple
, but systemd does not consider the service
activated until it acquires a name on D-Bus.
notify
- like simple
, but systemd does not consider the service
activated until it makes a call to the C function
sd_notify
,
part of the systemd library.
idle
- like simple
but defers running the service until other
jobs have finished; a hack to avoid interleaved spam on the console.
The whole idea of systemd's service supervision and activation system is built on being able to start services asynchronously as soon as their dependencies are met (and no sooner). However, none of the above choices actually make it possible to do this with a daemon that was not written specifically to interact with systemd!
In the case of simple
, there is no way for systemd to determine when
the daemon is actually active and providing the service that
subsequent services may depend on. If using "socket activation" (a
feature by which systemd allocates the sockets a daemon will listen on
and passes them to the daemon to use), this may not matter. However,
most daemons not written for systemd are not able to accept
preexisting sockets, and even if they can, this might preclude some of
their functionality.
In the case of forking
, systemd assumes that, after the original
process exits, the forked daemon is already initialized and ready to
provide its service. Not only is this unlikely to be true; attempting
to make it true is likely to lead to buggy daemon code. If you're
going to fork
in a daemon, doing so needs to be one of the first
things your program does; otherwise, if anything you do (e.g. calling
third-party library code) creates additional threads, a subsequent
fork
puts the child in an async-signal context and the child
basically cannot do anything but execve
or _exit
without invoking
undefined behavior. So it's almost certainly wrong to write a daemon
that forks at the last step after setting itself up successfully. You
could instead fork right away but use a synchronization primitive to
prevent the parent from exiting before the child signals it to do so;
however, I have not seen this done in practice. And no matter what you
do, if your daemon forks, you're subject to all the race issues of
using pid files.
The remaining nontrivial options are dbus
and notify
; both of
these depend on daemons being written as part of the
Freedesktop.org/systemd library framework. There is no documented,
stable way for a daemon to use either of these options without linking
to D-Bus's library and/or systemd's library (and thereby, for binary
packages, pulling in a dependency on these packages even if the user
is not using them). Furthermore, there are issues of accessing the
notification channel. If the daemon has to sandbox itself (e.g.
chroot, namespace/container, dropping root, etc.) before it finishes
initializing, it may not even have a means to access to notification
channel to inform systemd of its success, or any means to prove its
identity even if it could access the channel.
So in short, the only way to make systemd's asynchronous service activation reliable is to add systemd-specific (or D-Bus specific) code into the daemon, and even these may not work reliably for all usage cases.
There are several ways this could have been avoided:
Rather than requiring library code to notify systemd that the daemon is ready, use some existing trivial method. The simplest would be asking daemons to add an option to write (anything; the contents don't matter) to and close a particular file descriptor once they're ready. Then systemd could detect success as a non-empty pipe, and the default case (closing the pipe or exiting without writing anything) would be interpreted as failure.
Despite it being against the "spirit" of systemd, this is perhaps the cleanest and most reliable: have systemd poll whatever service the daemon is supposed to provide. For example, if the service is starting sshd on port 22, systemd could repeatedly try connecting to port 22, with exponential backoff, until it succeeds. This approach requires no modification to existing daemons, and if implemented correctly, would have minimal cost (only at daemon start time) in cpu load and startup latency.
Thankfully, this approach is already possible, albeit in a very convoluted way, without modifying systemd: you can wrap daemons with a wrapper utility that performs the polling and reports back to systemd using the sd_notify API.
As it stands, my view is that systemd has failed to solve the problem
everybody thinks it's solved: making dependency-based service startup
work robustly without the traditional hacks (like sleep 1
) all over
the place in ugly init scripts. What it has instead done is setup a
situation where major daemons are going to come under pressure to link
to systemd's library and/or integrate themselves with D-Bus in order
to make systemd's promises into a reality. And this of course leads to
more entangled cross-dependency and more platform-specific behavior
working its way into cross-platform software.