A single-threaded race

Ok, well, kind of. Technically the example I discuss here is multithreaded, but it doesn’t look that way to the user.

The Example

Consider the following question: At the end of this program, what colour is aStick?

I shudder to think what industry this program serves

HockeyStick aStick;

void main()
{
    Ostrich muhOstrich;

    // Sets a callback to be executed whenever the ostrich's mode is changed
    muhOstrich.addModeCallback(MakeStickBeRed);

    muhOstrich.setMode(eRunning);
    aStick.setColour(eBlue);
}

void MakeStickBeRed()
{
    aStick.setColour(eRed);
}

To me, the intuitive answer is "blue". I expect MakeStickBeRed() to be executed before Ostrich::setMode() returns.

Yeah, so what?

So what if I told you that Ostrich::setMode() is an asynchronous function? It is not guaranteed to be executed before the function call returns. Thus, the callback is also not guaranteed to be executed before the function call returns.

This is visualized by the diagram below. it is unknowable whether the user thread or the ostrich thread will "win" and be the last to set the colour.

Now what’s the answer? Well, there isn’t one. Provided that HockeyStick::setColour() is thread-safe, the stick will be either blue or red, but it is unknowable which will be the final state (and it will probably vary with each run of the program). The ostensibly single-threaded program is in a race with a thread silently created by Ostrich.

If the asynchronous nature of Ostrich::setMode() is not documented (or in the unlikely event that the programmer does not read the documentation), this would be very surprising behaviour. It may impact program correctness.

Is that…ok?

The program is no longer executed in a sequentially consistent manner, meaning that events appear to happen in an order other than that which was written in the source code. There are some circumstances under which this would not be acceptable. It would definitely not be acceptable in a general purpose system, because as a rule of thumb we do not know how important the ordering of any two function calls is. To know if it matters, you need to know what the functions do, and whether swapping the ordering is important.

Programming languages that support multi-threading are of course themselves general purpose systems. When developing the C++11 memory model, some CPU vendors tried to push for a model that was weaker than full sequential consistency.^[1] It was concluded that anything less made it effectively impossible to reason about program behaviour.

But this program is not a whole programming language, nor is it part of a general purpose system. This is a system that deals particularly with ostriches and hockey sticks (for some reason). Maybe, for this application, this specific type of race is ok. Maybe in the end, it’s ok for the hockey stick to be either blue or red. If that’s the case, then we can reap the benefits of the asynchronous function call and not worry about these edge cases.

If you create this kind of situation in your library, you need to provide your own synchronization primitive for when users can’t tolerate races of this sort. For example, Ostrich.HoldOnASecTherePardner() might wait until all pending functions on the ostrich have been executed before returning. The user can guarantee that the hockey stick will be blue by calling that synchronization function immediately after setting the ostrich mode.

Can we fix it?

So long as you have asynchronous functions, I believe it is impossible to solve this in a generic way. Certainly there is no general way for HockeyStick to know that this dependency exists, so it’s not possible to fix it in the implementation of HockeyStick::setColour(). Would you make Ostrich::setMode() into a synchronous (aka "normal") function just because somebody might have hooked a function to it?

Callbacks of this type are inherently cursed, because they allow the library user to do things which you yourself cannot as the library implementer (because they would cause you to break sequential consistency). I have devoted a surprising number of my working hours to contemplating this question:

I’m not allowed to do X internally because it violates my strong threading architecture. What should I do when the user does X in a callback?

Sometimes the answer is:

Document that doing X will break things. The recommended workaround is to not do X.

— The weary threading architect

P.S. Why asynchronous functions?

I expect this will be review for most who have gotten this far, but let’s look at why we allow the cursedness of async functions.

Fire and forget

For a lot of output systems (file writes, network writes, display updates) it is not important that things happen immediately when you ask for them. Usually you only care that things happen in the order you specified. Beyond that, you want your code to run as fast as possible rather than sitting around waiting for hard drives, or GPUs, or whatever.

If you write the text potatohammer to a file, then immediately read it back, the system had damn well better make sure that the data you read back contains the word potatohammer. This is most easily achieved by making the read function block until all pending writes have been processed.

Command analysis

The backend system that finally executes functions can potentially do smart stuff like batching similar commands, removing commands that are superseded by later ones, and re-ordering commands if that is more efficient (and remains sequentially consistent).

Your CPU does this
Your compiler does this
Your OS does this
Your GPU driver does this
Your SSD does this
Your everything does this

This is why computers are so damn fast.

Preventing deadlocks

An asynchronous function can be called safely from anywhere.

This is a little bit more complicated as it does not apply in all situations. It is only relevant if you have multiple mutexes which may need to be taken simultaneously to complete an operation.

If you call a synchronous function that takes a mutex, and another thread is holding that mutex, your thread will block. If the other thread is waiting on a mutex that your thread already holds, then you have a deadlock. Avoiding this situation is one of the core jobs of a threading model.

If the function you call is instead asynchronous, then it will take its mutex later; presumably at a time when it is safe to do so.

In most cases this should not change the semantics of the program; this post is about a rare case where it does, but the result may be acceptable.

The core benefit is that, if your threading architecture is well-defined, there will be circumstances under which you know that is not safe to call synchronous functions on a particular object. In these cases, every asynchronous function is a win.

1. Strictly speaking,the C++ memory model guarantees data-race-free sequential consistency. Execution of your program is guaranteed to be sequentially consistent so long as you do not write a race. Mutexes and Atomics are explicit synchronization primitives you can use to meet this requirement.