Traditional unix systems had a
vfork function, which works like
fork, but without creating a new virtual address space; the parent
and child run in the same address space. Unlike with
where the new thread runs on its own stack,
vfork behaves like
fork and “returns twice”, once in the child and once in the parent.
This seems impossible, since the parent and child would clobber one
another’s stacks, but a clever trick saves the day: the parent process
is suspended until the child performs
_exit, breaking the
shared-memory-space relation between the two processes.
vfork was omitted from POSIX and modern standards because it’s
difficult to use; the original specification for the function left it
undefined to do basically anything except
vfork in the child. However, many systems (including Linux) still
provide a similar or identical interface at the kernel level, and new
interest in its use has arisen again due to the fact that huge
fork on systems with strict commit charge
accounting due to lack of memory, and the fact that copying the
virtual memory layout of a process can be expensive if the process has
a huge number of maps created by
mmap. As such, both musl and glibc
vfork to implement
posix_spawn, the modern interface for
executing external programs as a new process.
While working on
vfork usage in musl’s
I realized that using it is a lot trickier (and more dangerous!) than
I’d realized before. Here are some of the issues.
vfork child runs in the same address space as the parent,
care needs to be taken to ensure that it does not modify the parent’s
memory in unwanted/unsafe ways. This seems easy enough, until you
realize that the calling program might have installed signal handlers,
and these signal handlers could get invoked in the child. The most
likely way this could happen is when signals are sent to entire
process groups, for example, as a result of events like pressing the
interrupt/quit key on the controlling terminal, or resizing the
terminal. However, signals can also be sent explicitly to a process
group as well.
If a signal handler runs in the child after
vfork, there are several
different ways it could corrupt the parent’s state:
As such, if
vfork is to be used in code where the caller might have
signal handlers which could be broken by the above issues, it’s
necessary to ensure that the parent’s signal handlers don’t get
invoked in the child. This amounts to:
SIG_DFL, and then restore the old signal mask.
vforkreturns in the parent, restore the signal mask.
Unfortunately, step 2 was illegal (undefined behavior) in the original
vfork, which basically meant
vfork was impossible
to use. Fortunately, on systems where
vfork is supported, more
specific semantics are provided/guaranteed, and step 2 works.
vfork was pretty much history by the time people started
caring about threads. But in real-world implementations (Linux), we
can observe that
vfork doesn’t suspend the whole parent process
(which would be really difficult to do, anyway), but instead just
suspends the calling thread until the child calls
This means that concurrency issues exist, and the
vfork child is
actually sharing memory with other running code, not just a
This leads us to...
Now we get to the worst of it. Threads and
vfork allow you to get in
a situation where two processes are both sharing memory space and
running at the same time. Now, what happens if another thread in the
setuid (or any other privilege-affecting function)? You
end up with two processes with different privilege levels running in a
shared address space. And this is A Bad Thing.
Consider for example a multi-threaded server daemon, running initially
as root, that’s using
posix_spawn, implemented naively with
to run an external command. It doesn’t care if this command runs as
root or with low privileges, since it’s a fixed command line with
fixed environment and can’t do anything harmful. (As a stupid example,
let’s say it’s running
date as an external command because the
programmer couldn’t figure out how to use
Since it doesn’t care, it calls
setuid in another thread without any
synchronization against running the external program, with the intent
to drop down to a normal user and execute user-provided code (perhaps
a script or
dlopen-obtained module) as that user. Unfortunately, it
just gave that user permission to
mmap new code over top of the
posix_spawn code, or to change the strings
exec in the child. Whoops.
The easy out would be just giving up on
vfork. But in musl, a major
target is systems where robustness is required (no overcommit) and
memory might be constrained; therefore,
fork is not a good option.
vfork to implement
posix_spawn might eventually allow
us to support no-MMU targets.
In musl, there’s already a global lock that controls calls to the
setuid family of functions; it was needed because Linux requires a
userspace process to synchronize all its threads to make the
etc. syscalls rather than doing the synchronization in kernelspace.
Thus, it was easy to just reuse this lock to prevent uid/gid changes
posix_spawn is running. I believe glibc could do the same,
since it has an equivalent locking mechanism in NPTL. musl may have a
slight advantage here at present, in that the lock we’re using is a
reader-writer lock, and callers of
posix_spawn only count as
readers, not writers.
I’ve filed and reopened several glibc bug reports related to the above issues:
These are security-relevant, but the rarity of multi-threaded programs
that change their uid/gid after going threaded, and the rarity of
real-world programs using
posix_spawn, makes the impact extremely
low at present.
Linux provides a
CLONE_VFORK flag to
clone which provides similar
semantics to the traditional behavior of
vfork, but allowing the new
process to run on a separate stack with its own entry point, instead
of utilizing the returns-twice idiom of
fork. However, this does not
solve any of the above problems; the signal handler and setuid issues
still exist, and code in the child still has to tiptoe around anything
that might upset the parent’s state. As such, I don’t see it as being
any more useful than “traditional”
vfork for implementing things