EWONTFIX

Asynchronous cancellation pitfalls

07 Oct 2012 04:24 GMT

In the past few posts, I’ve introduced thread cancellation and some of the implementation and application usage difficulties in making cancellation robust. One topic I haven’t yet touched on is asynchronous cancellation. POSIX threads support two cancellation types: asynchronous and deferred. The latter, deferred cancellation, is the default, whereby a cancellation request is only acted upon immediately if the thread to be cancelled is suspended at a cancellation point, and otherwise remains pending until the next call to a cancellation point.

The other option, asynchronous cancellation, allows (but does not require) the implementation to act on cancellation requests at any time. This obviously has the potential, to leave data in a horribly inconsistent state, so rules are imposed; the application cannot call any functions from the standard library except those designated “async-cancel safe” while asynchronous cancellation is in effect. Essentially, one can think of asynchronous cancellation as a feature to be used only when a thread is performing a long-running pure computation.

One obvious problem with asynchronous cancellation that limits its utility is the fact that it does not obligate an implementation to act on cancellation requests immediately; the language of the standard is such that the request need only be acted upon at some point between when it’s made and the next call to a cancellation point. Unfortunately, there do not exist any cancellation points which are also async-cancel-safe, so a pathological but conforming implementation could actually treat async cancellation mode the same as it treats disabling cancellation.

OK, let’s forget about pathological and gratuitously broken implementations for a moment. Even if asynchronous cancellation mode does cause requests arriving to be acted upon immediately, there’s still the question of what happens when a cancellation request is already pending upon switching to asynchronous cancellation. POSIX does not say anything about this case, and if cancellation is implemented as (or analogously to) a signal, it would be an easy oversight for an implementation to forget to check for alread-pending cancellation requests when switching to asynchronous cancellation.

What if an application wants to account for this possible quality of implementation issue? In principle, an application should just be able to call pthread_testcancel right after switching on asynchronous cancellation. But it can’t do this, because pthread_testcancel is not async-cancel-safe. And if it calls pthread_testcancel before switching modes, there’s a race condition whereby a request that arrives between these two calls won’t be picked up.

What we’re stuck with is a situation where the standard allows low-quality implementations and also forbids the obvious workaround applications could use to handle such low-quality implementations.

Do any such low-quality implementations exist? If not, POSIX should probably standardize that switching to asynchronous cancellation acts on any pending cancellation request (for example, by specifying pthread_setcanceltype to be a cancellation point when its argument is PTHREAD_CANCEL_ASYNCHRONOUS).