EWONTFIX - To overcommit or not to overcommit

To overcommit or not to overcommit

23 Sep 2012 01:43 GMT

I’ve written in the past on the topic of overcommit, which depending on your perspective, is either a feature of Linux and some other kernels, or a bug left over from a time when folks didn’t know how to do virtual memory accounting properly. I’m a serious proponent of strict commit accounting (opposite of overcommit), but for this article, I want to look at the state of the software ecosystem and how it often leaves us overcommit-enabled Linux systems being more failproof than their strict-accounting brothers and sisters.

The idea of strict commit accounting is that malloc never reports success only to let your program crash when you actually try to use the memory. If the kernel cannot ensure that there’s no possible sequence of paging events that would cause it to run out of physical storage for all to-be-mapped pages, then allocating new pages fails, and malloc returns a null pointer. This gives your program the power, but also the responsibility, to check for out-of-memory conditions and handle them, which is in principle, a very good thing.

Where problems begin to arise is when programs don’t check the return value of malloc, or use it only for the sake of calling abort when allocation fails.

Let’s consider a possible (rather likely, these days) scenario: you have several core system components (things like init/upstart/systemd, inetd, sshd, etc.) that would leave the system in a crippled, unusable, or even kernel-panic state if they die, and these programs are making use of dynamic allocation. What happens when your machine runs out of memory?

If strict commit accounting is in effect, one of their calls to malloc fails, resulting in one of the following:

Dealing with the failure gracefully, but failing to provide service at this time, and possibly also failing in other areas like logging. This is what a well-behaved program would do.
Immediately calling abort on the first allocation failure. A core system component is not likely to do this itself, but it may inadvertently do so by relying on a library (such as glib) that unconditionally aborts the calling program on allocation failure.
Ignoring the fact that malloc could fail, and dereferencing relative to a null pointer. This is the worst possible behavior, but there’s plenty of software that buggy.

If on the other hand overcommit is enabled, along with Linux’s heuristic OOM killer, there’s only one likely result: this critical system component was either manually marked as not a candidate for OOM killing, or was naturally not a candidate since it never engaged in allocation behaviors that the OOM killer judged as abusive. Some bloated desktop app like Firefox or OpenOffice if this is a desktop system, or some runaway PHP program if it’s a webserver, gets OOM-killed instead, and the system is back to “normal” (minus the user perhaps being angry about losing his or her session).

Does this mean I’ve changed my mind about overcommit and it’s actually a good thing? No, not really. What it means, at least in my mind, is that there’s a great deal of work that needs to be done auditing core system components for robustness and fail-safe behavior. In particular:

Such programs should, like hard realtime or embedded systems software, either avoid dynamic allocation entirely or do all the essential allocation at startup and only non-essential allocation later.
Any use by such programs of third-party libraries that depend on dynamic allocation should be audited to ensure that the libraries properly handle allocation failure without aborting the program.

This is hard work, but I still believe it’s better than the current situation where stability of the essential core components of the system depends on sacrificing (OOM-killing) user applications which might have valuable unsaved data. It just means we still have a long way to go towards a rock-solid, crash-free FOSS platform...