EWONTFIX

AS + DC = AC

15 Oct 2012 00:24:57 GMT

Where A.S. are asynchronous signals, D.C. is deferred cancellation, and A.C. is asynchronous cancellation. In the previous post, I discussed asychronous versus deferred cancellation in POSIX threads, and issues that make it hard to use asynchronous cancellation well. I also mentioned that there are almost no functions which are async-cancel-safe. What if you want to cheat and get the behavior of asynchronous cancellation, but without having to follow the rules?

Enter asynchronous signals. Particularly, I’m thinking of signals sent to a specific thread using pthread_kill, but really the signal could be coming from another source like pressing the interrupt or quit key on a terminal.

Suppose the main flow of excution in a thread uses only async-signal-safe functions. These are not to be confused with async-cancel-safe functions, of which only three exist; the set of async-signal-safe functions is relatively large and includes lots of powerful tools. Cancellation mode is left as deferred (the default).

Now suppose you have a signal handler that interrupts the main flow of execution and calls pthread_testcancel. Despite pthread_testcancel being async-signal-unsafe, this is legal because no other async-signal-unsafe function was interrupted by the signal.

Under this setup, a call to pthread_cancel followed by sending the signal gives you the equivalent of asynchronous cancellation, but rather than being restricted to only calling async-cancel-safe functions, it seems it’s now legal to call arbitrary async-signal-safe functions.

On the one hand, this seems to be an argument that the concept of async-cancel-safety is misguided, and that async-signal-safety should be the condition for which functions an application can call when in asynchronous cancellation mode.

However, let’s take it a step further. pthread_testcancel was not async-signal-safe, but close is. So, close(-1) (a no-op, aside from setting errno) is an async-signal-safe version of pthread_testcancel! Now, we’re no longer restricted to only calling async-signal-safe functions in the main flow of execution, since the signal handler is async-signal safe. But this is obviously wrong. Cancelling a thread while it’s in the middle of malloc or printf is not something that’s intended to work.

On the other hand, this brings up serious concerns for applications which are not trying to get around the async-cancel-safety rules, but which just happen to call cancellation points from their signal handlers. Doing so will cause the interrupted code to get cancelled exactly as if asynchronous cancellation had been enabled. And this is dangerous and generally unwanted.

A well-behaved application would like to just call pthread_setcancelstate in its signal handlers to set PTHREAD_CANCEL_DISABLED for the duration of the signal handler. But that’s in general not possible, since pthread_setcancelstate is async-signal-unsafe. This leaves me with a conclusion that, unless/until some improvement or clarification is added to the standard, applications need to ensure that signal handler containing cancellation points do not get run in threads that are potential targets of cancellation.

I’ve filled issues #615 and #622 on the Austin Group bug tracker. Ultimately I think this just comes down to an omission in the standard of any text to forbid this madness, hopefully an omission which can be quickly and easily resolved. But it’s provided some nice insight into the non-obvious complexity of cancellation and its interaction with other features of the POSIX standard with which it was probably never intended to be used.