This post is going to be the first that’s about one of my own bugs, in
musl. For a long time, I’ve had certain
stdio functions such as feof
and ferror
forgoing any locking, and
simply relying on the fact that, per the memory model that’s assumed,
reading the associated flags is safe without any locks. The issue with
doing this is that, while it’s safe, it’s not correct; it leads to
observably incorrect behavior in some cases.
Per POSIX,
All functions that reference ( FILE *) objects shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these ( FILE *) objects.
(This requirement is tucked away in the specification of
flockfile
,
so it’s not immediately apparent if you start out just reading the
General Information in XSH Chapter 2.)
What this means is that, even if locking is not required to make a
particular operation safe, there’s a requirement on the observable
behavior of the program that it be as if the locking happened.
Obviously, the existence of flockfile
and funlockfile
makes it so
that an observably-wrong behavior can happen if feof
does not wait
for the lock; for example, if thread A never unlocks a file F except
when F is at end-of-file, then an observably-wrong behavior occurs if
thread B ever observes feof
returning zero for F.
This is the boring case. What’s interesting is when we ask, if
flockfile
and funlockfile
were omitted (for example, in a
statically linked program that does not use them), would it be valid
to skip locking in functions like feof
? (This could easily be
achieved with weak symbols.) It turns out the answer is still no, and
the reason is rather surprising: it’s actually possible to deduce that
a thread is in the middle of a stdio operation. This is contrary to
the general principle that it’s usually impossible to deduce, without
race conditions, whether a thread is executing a function or just
about to execute or just finished executing it.
As one example, consider a program with two threads, A and B, where
thread A has filled up the writing end of a pipe and observed (via
select
or poll
or EWOULDBLOCK
) that further writes would block.
Now, thread B calls a stdio function to read from a stream attached to
the reading end of the pipe. If thread A then observes that the pipe
has become writable again, and if there are no other readers, then
it’s possible to deduce that the operations thread B is performing on
the stream have started (in which case the lock must have been
obtained). If thread A has not written a sufficient amount of data to
cause thread B’s operation on the stream to finish (for example, if
the function is fread
and thread A wrote fewer than the requested
number of bytes), it’s also possible to deduce that the operation
could not have completed. Thus, the lock must be held by thread B, and
any operation performed on the stream by thread A must deadlock. In
particular, it would be wrong for feof
to return zero, since by the
time thread B’s operation completes, the stream may in fact have
reached end-of-file status.
Another interesting issue that arises is the need for even fclose
to
perform locking. I originally reasoned that locking should not be
needed in fclose
, since any use of the stream pointer after fclose
is called results in undefined behavior. However, if another thread
provably holds the lock (either by having called flockfile
, or by an
argument such as the above example), the standard assigns well-defined
behavior to the fclose
call; fclose
must wait to obtain the lock
before closing the stream and freeing the associated resources. As
long as the thread that held the lock does not make any further
attempt to use the stream after the lock is released, the program has
well-defined behavior, and implementations must support this usage.
As a consequence of this analysis, I’m fixing all the affected interfaces in musl to use locking.