The C and C++ languages define the macro NULL
, widely taught as the
correct way to write a literal null pointer. The motivations for using
NULL
are well-meaning; largely they come down to fact that it
documents the intent, much like how a macro named FALSE
might better
document boolean intent than a literal 0
would do. Unfortunately,
use of the NULL
macro without fully understanding it can lead to
subtle bugs and portability issues, some of which are difficult for
compilers and static analysis tools to diagnose.
Despite it being superceded by the 2011 standard, I'm going to quote
C99 because it's what I'm most familar with, and I suspect most
readers are in the same situation. 7.17 specifies the NULL
macro as:
NULL
which expands to an implementation-defined null pointer constant;
6.3.2.3 Pointers in turn defines null pointer constant as:
An integer constant expression with the value 0, or such an expression cast to type void *
And finally, for completeness, 6.6 defines integer constant expression as:
Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
What's important to realize is that this allows a great deal of
freedom with regard to how an implementation defines NULL
. Some
examples:
#define NULL 0
#define NULL ((void *)0)
#define NULL 0L
typedef void *__voidp;
#define NULL ((__voidp)0)
#define NULL (1ULL-1ULL)
#define NULL ('9'-'0'-9)
#define NULL (sizeof(void *)/(sizeof(char)-2))
In particular, the set of possible types NULL
could have includes
all integer types and void *
. Fortunately, since these are all null
pointer constants, in most contexts using NULL
will get you The
Right Thing no matter which type NULL
might have. Pointer
comparisons, assignments to pointers, the ?:
operator, and so forth
all treat null pointer constants in special ways such that the type of
the null pointer constant mostly doesn't matter.
The main place the type of NULL
does become an issue is with
variadic (or non-prototyped) functions. Consider these examples:
printf("%p", NULL); // WRONG!
execl("/bin/true", "true", NULL); // WRONG!
The first is wrong, in general, because the %p
specifier for
printf
requires an argument of type void *
. The second example is
always wrong because the execl
function requires arguments of type
char *
, and NULL
can never have type char *
.
One might object that, despite being formally incorrect, the above
should work, especially if NULL
is defined with the typical C
definition of ((void *)0)
. Assuming also that pointers of different
types are treated identically by the calling convention, having type
void *
instead of char *
would not matter when calling execl
.
(Arguably it may not matter anyway, since C specified that char *
and void *
have the same representation, but it's not entirely clear
how this applies to arguments to variadic functions.)
Things get a lot uglier with C++. In C++, NULL
was traditionally
defined as plain 0
, on account of the fact that C++ has no implicit
conversion from void *
to other pointer types. (Note that the
current C++ standard also allows NULL
to be defined as nullptr
,
and some non-conforming implmentations define it in other ways, such
as __null
.) So in many historical C++ implementations, if you pass
NULL
to a variadic function, bad things might happen.
And bad things did happen. We were encountering several
hard-to-track-down crash bugs in certain applications (including
evince and audacity) built against musl,
which at the time defined NULL
as 0
for C++ programs, and it
turned out the source of the problem was that the C++ code was making
calls to variadic C functions using NULL
as a sentinel. The
original report and
further details are available in the mailing list archives for musl.
The crashes that were happening were on 64-bit archs (at the time, for
us that meant just x86_64
) and were, if I recall, "heisenbugs":
inserting additional code to inspect what was going on made some or
all of them go away. As it turns out, while x86_64
(and most 64-bit
archs) uses entire registers or 64-bit stack slots for all arguments,
even ones smaller than 64-bit, there's no guarantee in the ABI that
the unused upper bits will be cleared by the caller. So, when the
caller passed 0
as an int
to a variadic function, the upper 32
bits of the stack slot contained whatever junk happened to be there
before, and the callee's va_arg
, attempting to read an argument of
type void *
, pulled the junk in as the upper bits of the pointer.
Thus, manifestation of the bug was highly dependent both on the code
executed just prior to the variadic call, and the compiler's choice of
how to setup the arguments (e.g. push
versus mov
).
As far as what we would do about the issue, I've been on the fence for
a long time about the relative merits of trying to work around
incorrect, non-portable applications versus actively breaking them and
pressuring their maintainers to fix them. However, the widespread
incorrect use of NULL
seemed overwhelming, and developers seemed
very reluctant to fix the issue. Fortunately, we came up with what I
believe is a really good solution on musl's side: replacing this:
#ifdef __cplusplus
#define NULL 0
#else
#define NULL ((void *)0)
#endif
with this:
#define NULL 0L
(Note that musl assumes a type model where long
and pointers have
the same size; this is also assumed by Linux and is fundamental to the
way system calls work, so it's not an unreasonable assumption for us.)
By making this change, all programs that assume they can pass NULL
as a pointer argument to variadic functions (sentinel role or
otherwise) continue to work as if NULL
were defined as ((void
*)0)
. However, GCC does not consider 0L
a valid "sentinel" value
(since it has the wrong type) for functions declared with the
sentinel
attribute.
So what we have achieved is making incorrect programs work, but
produce warnings where the code is incorrect.
One initial concern I had with this approach is that the matter of
whether NULL
has pointer or integer type might be observable to
programs in such a way that the change from ((void *)0)
to 0L
could break some incorrect C programs. However, after some of the
smart folks over at Stack Overflow tried
to find ways a program might detect whether NULL is defined with
integer or pointer
type
and failed, I deemed the change safe, and our developers and users
have not run into any problems since.
It's been some time now (about 6 months) since the issue was raised
and addressed in musl, but I just recently reported the sentinel
issue
in Busybox and submitted a patch to fix it.
While getting the patch accepted was painless, I mention this anecdote
as a possible explanation for why the NULL
sentinel bug is so
prevalent: after the patch was accepted, I received several follow-up
emails
chipping in with misconceptions about NULL
. The impression I was
left with is that everybody seems to think NULL
is a much simpler
topic than it really is.
So what would I recommend programmers do with NULL
? After all this
trouble with variadic function arguments, I've just basically stopped
using NULL
completely. A simple 0 is shorter and usually clear when
initializing or returning a pointer, and for null pointer checks, I
usually use the !
operator rather than ==NULL
anyway. However, I
really don't have any opinion on what others should do; from a
correctness standpoint, what matters is simply not making
assumptions about the type of NULL
. Anywhere the type of the
expression matters, you really need to use a cast, which you could
apply to NULL
, or simply to 0
.
As a final remark, I don't think this article would be complete without mentioning a similar article, albeit much shorter and less detailed, written by Jens Gustedt over two years ago.