In preparing to release musl 1.2.0, I worked with distro maintainers from Adélie Linux and Yoe to find serious application compatibility problems users would hit when upgrading, so that we could have patches ready and reduce user frustration with the upgrade. Here are some of the findings.
By far the most dangerous type of app compatibility issue we found was
in Berkeley DB 5.x, which defines its own wrong version of the
timespec
struct to pass to clock_gettime
:
typedef struct {
time_t tv_sec; /* seconds */
#ifdef HAVE_MIXED_SIZE_ADDRESSING
int32_t tv_nsec;
#else
long tv_nsec; /* nanoseconds */
#endif
} db_timespec;
The type of tv_sec
in their redefinition was right (time_t
), using
the libc type for time_t
. However, the layout of the rest of the
struct failed to match. The libc timespec
struct is (prettified and
removing conditionals):
struct timespec {
time_t tv_sec;
long tv_nsec;
long :32;
};
with the padding and tv_nsec
swapped for big-endian archs, so as to
line up with the low bits of a 64-bit field. However the padding
matters even on little-endian archs, since otherwise the structure may
be only 12 bytes long instead of 16 bytes (on archs that don't impose
8-byte alignment). This includes i386, sh, or1k, m68k, and others. So,
when Berkeley DB declared its own version without the padding, it
passed a 12-byte object to a function expecting a 16-byte one,
resulting in stack smashing. Thankfully, stack-protector caught it;
otherwise it might have yielded branch to a location specified by
current time in nanoseconds mod 1 second.
Note that even without the size/layout mismatch, defining a custom
type to pass to clock_gettime
was undefined behavior and should
never have been done.
(Reportedly there are later versions of Berkeley DB, which may or may not have already fixed the bug, but after taking over maintenance of it Oracle went and relicensed the code to be license-incompatible with basically every piece of software ever to use it, so in practice everyone uses 5.x.)
A similarly dangerous variant of this kind of issue is bypassing libc
declarations to declare time-related functions manually. This is
forbidden by C/POSIX except for functions that can be declared without
the use of any types defined by the standard library, and in most
cases should produce compile-time errors if the declarations mismatch,
since making the types visible usually also makes the libc
declarations visible. However, some Qt 2.x code copy-and-pasted into
Doxygen
managed to do this without an error by using the preprocessor to
suppress the declaration of gettimeofday
, then subsequently
declaring the function itself:
#define gettimeofday __hide_gettimeofday
#include "qdatetime.h"
#include "qdatastream.h"
#include <stdio.h>
#include <time.h>
#if defined(_OS_WIN32_)
#if defined(_CC_BOOL_DEF_)
#undef bool
#include <windows.h>
#define bool int
#else
#include <windows.h>
#endif
#elif defined(_OS_MSDOS_)
#include <dos.h>
#elif defined(_OS_OS2_)
#include <os2.h>
#elif defined(_OS_UNIX_) || defined(_OS_MAC_)
#include <sys/time.h>
#include <unistd.h>
#undef gettimeofday
extern "C" int gettimeofday( struct timeval *, struct timezone * );
#endif
Purportedly the reason for this hack was to deal with ambiguity over
whether the secont argument is struct timespec *
or void *
, since
C++ can't deal with implicit conversions from void *
. However that's
irrelevant since a null pointer implicitly converts to either and they
pass a null pointer (because a null pointer is the only valid
argument value).
Fortunately, this error manifests in the opposite direction: causing
the old time32 function to get called, resulting in a smaller write to
the correct-sized timeval
type rather than an oversized write to an
incorrect timeval
type. But it still produced a crash, since the Qt
code passed the resulting junk time_t
value to localtime
, then
dereferenced the reult without checking for error (and localtime
returned null because the bogus time_t
value overflowed tm_year
when converting to struct tm
).
Other issues we found were a lot less spectacular, but may merit a second (or third) post later.