A little-known part of GCC's build process is a script called
"fixincludes", or fixinc.sh
. Purportedly, the purpose of this script
is to fix "non-ANSI system header files" which GCC "cannot compile".
This description seems to correspond roughly to the original intended
purpose of fixincludes, but the scope of what it does has since
ballooned into all sorts of unrelated changes. Let's look at the first
few rules in fixincludes' inclhack.def
:
Changing AIX's _LARGE_FILES
redirection of open
to open64
,
etc. to use GCC's __asm__
keyword rather than #define
, as the
latter breaks C++.
Exposing the long double
math functions in math.h on Mac OS
10.3.9, which inexplicably omitted declarations for them.
Adding workaround for Linux 2.2 and earlier kernel bug with
direction flag to FD_ZERO
macros.
Doing something inexplicable with Solaris's nonstandard header sys/varargs.h.
Removing incorrect char *
-based (rather than void *
-based)
prototypes for memcpy
, etc. on Sun OS 4.x, and replacing them with
the correct prototypes.
Replacing VxWorks' assert.h with the GCC developers' version. No explanation given for the reason.
Modifying some nonstandard VxWorks regs.h header to include another header whose definitions it depends on.
Replacing VxWorks's stdint.h, which is claimed to be broken, with
one full of incorrect definitions, for example #define UINT8_MAX
(~(uint8_t)0)
(which, contrary to the requirements of C, is not
valid for use in preprocessor conditionals).
etc.
(Source: fixincludes/inclhack.def)
Of the first 8 hacks (using GCC's terminology here) cited above, only one deals with fixing pre-ANSI-C headers. One more is fixing serious C++ breakage that would probably make it impossible to use C++ at all with the system headers, but the rest seem to be fixing, or attempting to fix, unrelated bugs that have nothing to do with making the compiler or compilation environment usable. And at least one has introduced major header breakage that might or might not be worse than what was in the vendor's original header.
In other words, what fixincludes evolved into is the GCC developers
forcibly applying their own, often arguably incorrect or buggy, bug
fix patches to system headers (and in some cases, non-system headers
that happen to be in /usr/include
) with no outside oversight or
input from the maintainers of the software they're patching.
So how does fixincludes work? Basically, it iterates over each header
file it finds under /usr/include
(or whatever the configured include
directory is), applies a set of heuristics based on the filename,
machine tuple, and a regular expression matched against the file
contents, and if these rules match, it applies a sequence of sed
commands to the file. As of this writing, there are 228 such "hacks"
that may be applied. The output is then stored in GCC's private
include-fixed directory (roughly
/usr/lib/gcc/$MACH/$VER/include-fixed
), which GCC searches before
the system include directory, thus "replacing" the original header
when the new GCC is used.
In case it's not already obvious what a bad idea this whole concept is, here are a few serious flaws:
Fixincludes prevents library upgrades from working. Suppose for
example you have libfoo
version 1.0 installed at the time GCC is
built and installed. The fixincludes script decides to patch foo.h
,
and puts its patched version in GCC's include-fixed
directory. Now
suppose you install libfoo
version 2.0, which comes with a new
foo.h
and which is incompatible with the definitions in the old
version of foo.h
. Due to GCC's include path order, the new version
of the header will be silently ignored and GCC will keep using the
old header from the version of libfoo
that was present when GCC was
installed. Moreover, since fixincludes does not take any precautions
to avoid applying its changes to files other than the original broken
file they were intended to fix, library authors who want to avoid the
danger of having their users get stuck with old headers must take on
the burden of ensuring that their header files don't match any of the
patterns in fixincludes.
Fixincludes can lead to unintended copyright infringement or leakage of private data. Unless you are fully aware of fixincludes, when building GCC, you would not expect an unbounded amount of local header files, some of which may be part of proprietary programs or site-local private headers, to end up in the GCC directory. Now, if you package up the GCC directory (think of people building cross compiler binaries and such), you could inadvertently ship copies of these headers in a public release.
Many of the fixes are actually incorrect or fail to achieve what
they're trying to achieve. For example, the VxWorks stdint.h "fix"
creates a badly broken stdint.h. Another example, which came up in our
development of musl, is the fix for
va_list
exposure in the prototypes in stdio.h
and wchar.h
. Per
ANSI/ISO C, va_list
is not defined in these headers (POSIX, on the
other hand, requires it to be defined), so GCC uses bad heuristic
regex matches to find such exposure and change it to __gnuc_va_list
.
Somehow (we never determined the reason), the resulting headers were
interfering with the definition of mbstate_t
and preventing
libstdc++
from compiling successfully. In addition, we found that,
while attempting to remedy an extremely minor "namespace pollution"
issue in these headers, fixincludes was making a new namespace
violation: for its double-inclusion guard macro, it used
FIXINC_WRAP_STDIO_H_STDIO_STDARG_H
, a name that happens to be in the
namespace reserved for the application, not the implementation.
The rules for whether and how to apply the "hacks" are poor heuristics, and no effort is made to avoid false positives. The README for fixincludes even states (line 118) their policy of applying hacks even when they might be a false positive, with no consideration for how incorrectly applying them (after all, they are hackish sed replacements, not anything robust or sophisticated) might break proper, working headers.
How could this situation be fixed? The GCC developers claim fixincludes is still needed (see also here), and while I'm fairly skeptical of this claim, I don't think it's a matter where they'll be convinced otherwise in the near future, so I'd like to look for other more constructive approaches. Here are the steps I think would be needed to fix fixincludes:
Remove all outdated hacks, i.e. hacks for systems which GCC no longer supports. While not strictly necessary, cleaning up the list of hacks in this manner should make the next steps more practical.
Remove all hacks for files that are none-of-GCC's-business. That means anything that doesn't absolutely need to be fixed to successfully compile GCC or provide a working (not necessarily fully conforming, if the underlying system was non-conforming, but "working") build environment after installation.
Eliminate false positives and buggy sed replacements by adding to
the hack definitions in inclhack.def
a list of hashes for
known-bad files the hack is meant to be applied to. If necessary,
include a configure option, off by default, that would ignore the
hashes.
Where some of the "fixes" made by fixincludes themselves have bugs like namespace violations or macros that do not meet the requirements for being usable in the preprocessor, they should be changed to output something more correct. There is no justification for replacing one broken header with another, potentially worse, broken header.
Add a --disable-fixincludes
option to configure so that
fixincludes can be completely turned off. This would be ideal for
system integrators, packagers, and basically anyone installing GCC
from source on a modern system. It's especially important for the
case where the user is installing GCC on a system that already has
many third-party library headers in /usr/include, some of which may
be "broken" in the eyes of fixincludes, where "fixing" them would
have the dangerous consequence of preventing future library
upgrades from working properly.
Finally, I suppose one might wonder why something that seems so
broken, as I've described fixincludes, might go undetected for so
long. The explanation is simple: distros. Most users of GCC use binary
packages prepared for a particular OS distribution, where the packager
has already cleaned up most of the mess, either by building GCC in a
sterile environment where it can't find any headers to pick up and
hack up, or by pruning the resulting include-fixed
directory. Thus,
the only people who have to deal with fixincludes are people who build
GCC from the source packages, or who are setting up build scripts for
their own deployment/distribution.
For the curious, here are some links to the tricks distros do to overcome fixincludes:
OpenSDE gcc package, whose comments claim there is another problem in fixincludes related to cross compiling, of which I am not aware.
It's unclear to me exactly what Debian does, but as their installed
include-fixed
directory is very minimal, they must also be doing
something. I have not inspected the other major binary distributions
with complex build and package systems, but casual experience suggests
they are taking some measures to contain the breakage of fixincludes.