This post is going to be the first that’s about one of my own bugs, in
musl. For a long time, I’ve had certain
stdio functions such as
ferror forgoing any locking, and
simply relying on the fact that, per the memory model that’s assumed,
reading the associated flags is safe without any locks. The issue with
doing this is that, while it’s safe, it’s not correct; it leads to
observably incorrect behavior in some cases.
All functions that reference ( FILE *) objects shall behave as if they use flockfile() and funlockfile() internally to obtain ownership of these ( FILE *) objects.
What this means is that, even if locking is not required to make a
particular operation safe, there’s a requirement on the observable
behavior of the program that it be as if the locking happened.
Obviously, the existence of
funlockfile makes it so
that an observably-wrong behavior can happen if
feof does not wait
for the lock; for example, if thread A never unlocks a file F except
when F is at end-of-file, then an observably-wrong behavior occurs if
thread B ever observes
feof returning zero for F.
This is the boring case. What’s interesting is when we ask, if
funlockfile were omitted (for example, in a
statically linked program that does not use them), would it be valid
to skip locking in functions like
feof? (This could easily be
achieved with weak symbols.) It turns out the answer is still no, and
the reason is rather surprising: it’s actually possible to deduce that
a thread is in the middle of a stdio operation. This is contrary to
the general principle that it’s usually impossible to deduce, without
race conditions, whether a thread is executing a function or just
about to execute or just finished executing it.
As one example, consider a program with two threads, A and B, where
thread A has filled up the writing end of a pipe and observed (via
EWOULDBLOCK) that further writes would block.
Now, thread B calls a stdio function to read from a stream attached to
the reading end of the pipe. If thread A then observes that the pipe
has become writable again, and if there are no other readers, then
it’s possible to deduce that the operations thread B is performing on
the stream have started (in which case the lock must have been
obtained). If thread A has not written a sufficient amount of data to
cause thread B’s operation on the stream to finish (for example, if
the function is
fread and thread A wrote fewer than the requested
number of bytes), it’s also possible to deduce that the operation
could not have completed. Thus, the lock must be held by thread B, and
any operation performed on the stream by thread A must deadlock. In
particular, it would be wrong for
feof to return zero, since by the
time thread B’s operation completes, the stream may in fact have
reached end-of-file status.
Another interesting issue that arises is the need for even
perform locking. I originally reasoned that locking should not be
fclose, since any use of the stream pointer after
is called results in undefined behavior. However, if another thread
provably holds the lock (either by having called
flockfile, or by an
argument such as the above example), the standard assigns well-defined
behavior to the
fclose must wait to obtain the lock
before closing the stream and freeing the associated resources. As
long as the thread that held the lock does not make any further
attempt to use the stream after the lock is released, the program has
well-defined behavior, and implementations must support this usage.
As a consequence of this analysis, I’m fixing all the affected interfaces in musl to use locking.