Synchrony is a myth. Nothing happens instantaneously. Everything takes time.
Some characteristics of computing systems and programming environments are fundamentally grounded in the fact that computing takes place in a physical world of three dimensions and limits imposed by the speed of light and the laws of thermodynamics.
This grounding in the physical world means that certain aspects stay relevant even as new technologies with new performance points and capabilities become available. These stay true because they are not just “design choices”; they are driven by fundamental realities of the physical universe.
The distinction between synchrony and asynchrony in language and system design is one example of a design area that has deep physical underpinnings. Most programmers are initially exposed to basic programs and languages where synchronous behavior is assumed. In fact, it is so natural that it is not even explicitly named or described. The meaning of the term synchronous in this context is that the computation happens immediately as a sequence of serial steps and nothing else happens until the computation is complete. I execute “c = a + b” or “x = f(y)” and nothing else happens until the execution of the statement completes.
Of course, in the physical universe, nothing happens instantly. Everything involves some latency to navigate a memory hierarchy, execute a processor cycle, read from a disk drive or communicate over some communication network with definite latency. This is a fundamental consequence of the speed of light and three dimensions.
Everything involves some delay, consumes some time. If we define something as synchronous, we are essentially saying we are going to ignore that delay and describe our computation as if it happens instantaneously. In fact, computer systems often allocate significant infrastructure to continue to leverage the underlying hardware even as they try to expose a more convenient programming interface by treating behavior as synchronous.
This idea that synchrony involves mechanism and overhead might seem counter-intuitive to a programmer who is more used to viewing asynchrony as requiring lots of explicit mechanism. In effect, what is really happening when exposing an asynchronous interface is that more of the true underlying asynchrony is exposed to the programmer to deal with manually rather than having the system handle it automatically. Directly exposing the asynchrony places more overhead on the programmer but also allows for customizing the costs and tradeoffs to the problem domain rather than having the system balance those costs and make those tradeoffs. An asynchronous interface often maps more directly to what is physically happening in the underlying system and provides additional opportunities for optimization.
For example, a processor and memory system have significant infrastructure in place to handle reading or writing data through the memory hierarchy. A level one (L1) cache memory reference might take a few nanoseconds while a memory reference that needs to navigate all the way through L2, L3 and main memory might take 100’s of nanoseconds. Simply waiting for the memory reference to resolve would leave the processor unused for a large percentage of the time.
The mechanisms required to optimize this are significant; pipelining by looking ahead in the instruction stream and having multiple memory fetches and stores in progress at one time, branch prediction to try to continue to optimize this execution even as the program jumps to new locations, careful management of memory barriers to ensure all this complex mechanism continues to expose a consistent memory model to the higher level programming environment. All this is trying to optimize performance and maximize use of hardware while hiding these 10–100s nanosecond latencies through the memory hierarchy and make it possible to expose what looks like synchronous execution and still get good performance out of the core processing engine.
Whether these optimizations are effective for a specific piece of code is very opaque and tends to require highly specialized performance tools to analyze. That type of analysis effort is only spent on limited types of high-value code (e.g. the Excel recalculation engine or some core compression or cryptographic code path).
Higher latencies like reading data from a spinning disk require different mechanisms. In that case, a request to read from disk will have the OS completely switch to a different thread or process while the synchronous request is outstanding. The high thread switching costs and overhead for this mechanism are acceptable because the latency being hidden is on the order of milliseconds, not nanoseconds. Note that the costs are not simply the cost of switching between running threads but the cost of all the memory and other resources that are sitting effectively idle and frozen until that operation completes. These costs are all incurred in order to expose this synchronous interface.
There are a number of fundamental reasons why a system might want to expose the real underlying asynchrony and why some component, layer or application would prefer to use an asynchronous interface, even at the cost of directly managing the increased complexity.
Parallelism. If the resource being exposed is capable of true parallelism, an asynchronous interface allows a client to more naturally issue and manage multiple requests outstanding at a time and allows the underlying resource to be more fully utilized.
Pipelining. A common way of reducing effective latency of some interface is to have multiple requests outstanding at one time (whether this is actually useful in improving performance depends on where the source of the latency arises). In any case, if the system is amenable to pipelining, effective latency can be reduced by a factor equal to the number of outstanding requests in progress. So a single request might take 10ms to complete but filling the pipeline with 10 requests might see a response arriving back every 1 ms. The aggregate throughput is a function of the available pipelining not just the latency for a single end-to-end request. A synchronous interface issuing a request and waiting for the response would always see the longer end-to-end latency.
Batching (local and remote). An asynchronous interface more naturally allows implementation of a system for batching requests, either locally or at the remote resource (note that “remote” in this case might be in a disk controller at the other end of an I/O interface). This is because the application already needs to manage getting a response with some delay, as it continues its local processing. That additional processing might involve making additional requests that can be naturally batched together.
Batching locally can allow more efficient transmission of a series of requests or even allow compaction and removal of duplicate requests locally. At the remote resource, having simultaneous access to a whole set of requests may allow for significant optimization. A classic example is a disk controller re-ordering a sequence of reads and writes to take advantage of the location of the disk head over the spinning platter in order to minimize average seek times. Any storage interface that works at the level of blocks could see major performance improvements by batching together a series of requests that all read or write the same block.
Of course local batching is possible through a synchronous interface but either requires a certain amount of shading the truth or requires that batching is an explicit feature of the interface which might add significant complexity for the client. Buffered IO is a classic example of “shading the truth”. An application calls “write(byte)” and the interface returns success, but in fact the actual write (and the truth of whether it has happened successfully or not) does not happen until some buffer gets filled and flushed, the buffer is explicitly flushed or flushed as the file is closed. Many applications can ignore these details — it only gets messy when applications need to guarantee some interacting sequences of operations and need truth from underlying layers.
Unblocking / Unbinding. One of the most common roles for asynchrony in the context of graphical user interfaces is keeping the core UI thread unblocked so the application can continue interacting with the user. The latencies of long-running operations (like interacting with the network) cannot be hidden behind a synchronous interface. In this case, the UI thread needs to explicitly manage these asynchronous operations and deal with the additional complexity it introduces.
The UI is just one example where a component needs to remain responsive to additional requests and therefore cannot easily use some general mechanism that hides latency in order to simplify the programming experience. The component of a web server that receives new socket connections will typically quickly hand it off to some other asynchronous component that actually manages the interaction on the socket so that it can return to receiving and processing new requests.
Synchronous designs in general tend to lock components and their processing models tightly together. Asynchronous interaction is often a mechanism for controlling coupling and allowing looser binding.
Reducing and Managing Overhead. As I mentioned above, any mechanism to hide asynchrony involves some allocation of resources and overhead. That overhead may not be acceptable for a given application and will drive the application designer to directly manage the inherent asynchrony.
The history of web servers is an interesting example. Early web servers (built on Unix) would fork a separate process to manage an incoming request. That process could then read and write to that connection in a generally synchronous fashion. An evolution of that design reduced overhead but maintained the generally synchronous model of interaction by using threads instead of processes. Current designs recognize that the primary focus of design is not the processing models but rather are dominated by the IO involved in reading and writing to the network or the database and file system while formulating a response. They generally use work queues and limit to some fixed set of threads where resource usage can be managed more explicitly.
The other interesting aspect here is that these tradeoffs are more explicit and amenable to tuning by the application designer with asynchronous designs. In the example of latencies in the memory hierarchy, the effective latency (measured in processor cycles per main memory request) has been rising dramatically over multiple decades. Processor designers scramble madly adding additional layers of cache and additional mechanisms that stretch the memory model being exposed by the processor in order to maintain that fiction of synchronous processing.
Context switching at synchronous IO boundaries is also an example where the effective tradeoffs have changed drastically over time. The vastly slower improvements in IO latencies compared to processor cycle times means that an application is forgoing a lot more computing opportunity when it sits blocked waiting for an IO to complete. This same issue of relative cost tradeoffs has driven OS designers to move to memory management designs that look much more like early process swapping approaches (where the entire process memory image is loaded into memory before the process starts executing) rather than demand paging. Hiding the latency that might occur at each page boundary becomes too hard a leap. Radically better aggregate bandwidth through large serial IO requests vs. random requests also drives that particular change.
Cancellation is a complicated topic. Historically, synchronous designs have done a poor job of handling cancellation, including omitting support for cancellation entirely. It inherently has to be modeled as out-of-band and invoked by a separate thread of execution. Alternatively, asynchronous designs support cancellation more naturally, including the trivial approach of simply deciding to ignore whatever response eventually (or not) returns. Cancellation rises in importance as the variance in latency and practical error rates increase — which describes how our networking environment has evolved over time pretty well.
Throttling / Resource Management
A synchronous design inherently enforces some throttling by not allowing the application to issue additional requests until the current request is complete. An asynchronous design does not get throttling for free and may need to address it explicitly. This post described an example in the Word Web App where moving from a synchronous design to an asynchronous design introduced a significant resource management problem. It is not unusual for an application using some synchronous interface to not recognize that throttling had been implicitly present in the code. Removing that implicit throttling allows (or forces) more explicit resource management.
I experienced this early in my career when porting a document editor application from Sun’s synchronous graphical API to X Windows. Using Sun’s API, a drawing operation was executed synchronously so the client did not get control back till it completed. In X Windows, a graphical request is dispatched asynchronously over a network connection and then executed by the window server (which might be on the same or different machine).
In order to provide good interactive performance, our application would do some drawing (e.g. ensuring that the current line with the cursor had been updated and drawn) and then check to see if there was more keyboard input to read. If there was more input, it would abandon painting the current screen (which would generally become out of date in any case after the pending input was processed) in order to read and process the input and redraw with the latest changes. This was tuned to work well with the synchronous graphical API. The asynchronous interface could accept drawing requests faster than they could be executed, leaving the screen lagging behind the user input. This was exceptionally bad when providing interactive stretching of images because the relative cost of issuing a request was so much lower than the cost of actually executing it. The UI would lag badly executing a whole series of unnecessary repaints issued for each updated mouse position.
This is still a sticky design problem more than 30 years later (Facebook’s iPhone app seems to suffer from precisely this problem in certain scenarios). An alternate design is to have the screen refresh code (which knows how fast the screen is capable of being refreshed) be the driver and explicitly call back to the client to paint a region rather than having the client drive the interaction. That creates tight timing demands on the client code to be effective and is not always practical.
Windows made an early decision to use mostly synchronous interfaces and extended that core approach over time to 100’s of thousands of APIs. Early APIs to access local PC resources could generally succeed or fail quickly, especially relative to the speed of those early processors. Synchronous inter-process APIs did tend to make Windows programs vulnerable to the worse written application on the box. As APIs were extended to the network and processors improved, the delay being hidden behind synchronous interfaces could become both large and highly variable making the tradeoffs involved in a synchronous approach less and less attractive.
“Wrapping a thread” around an API could essentially turn this synchronous interface asynchronous, at a significant expenditure of resources and without addressing broader cross-thread issues throughout a large interacting API set. This was an especially painful approach since the only thing the thread was doing was effectively waiting for an API to complete that itself was just sitting waiting for some asynchronous response from a lower-level API that it was calling. One could see applications with hundreds of allocated threads, only a few of which were actually doing any real work — despite the fact that a thread is fundamentally an abstraction for processing, not waiting.
There have been some improvements over time with use of thread pools and other techniques. Windows 8 and the Windows RunTime was an effort to clean up much of this legacy but has effectively foundered on the legacy persistence of the older Win32 API and so much code built on top of it.
This whole topic ultimately revolves around the relative challenge and complexity of building applications using asynchronous designs. At one of my first talks at a high-level internal Microsoft technical workshop, I argued that asynchronous APIs were the key to building applications that wouldn’t hang like PC applications of the time would so frequently do. From the back of the room, Butler Lampson, Turing Award winner and one of the fathers of personal computing, yelled “but then they won’t work, either”! I had many productive discussions with Butler over the following years and he continued to be very concerned with how to manage asynchrony at scale.
The second key problem is not addressed by any of these approaches. Essentially the problem is how to isolate intermediate state. In a synchronous design, any intermediate state used during the computation disappears and so is unavailable once the synchronous computation returns. An asynchronous design has the same problem as a design with multiple threads operating over shared data (beware ye dragons). The intermediate state is (potentially) exposed, exponentially increasing the effective number of total program states to reason over.
In fact, state leakage occurs in both directions. The internal state might be exposed to the rest of the program and the changing external state might be referenced inside of the asynchronous computation. Mechanisms like async/await that intentionally make the code look like serial synchronous code do not aid in clarifying the risks here; in fact they mask the risks.
As with the general design challenge of managing complexity, isolation, and especially data isolation, is a key strategy to apply. One challenge for graphical applications is that the application might very well want to expose and provide user feedback about these intermediate states. A common requirement is demonstrating progress being made on some computation, or interrupting and restarting a computation that has been impacted by subsequent user activity (Word page layout or Excel recalculation are classic examples that fit in both categories). In many cases I encountered over the years, much additional complexity arose from trying to determine how these additional program states should be exposed to the user rather than simply putting up a wait cursor. Certainly for operations that can take a non-trivial amount of time or are susceptible to failure, the user wants to know what the heck is going on.
Whatever strategy is applied, being disciplined and consistent ends up paying large dividends over time. Ad hoc approaches soon balloon in complexity.
The world is asynchronous. Synchrony is a myth — and an expensive one. Embracing the asynchrony can often allow more explicit modeling of the underlying reality of your system and allow explicit management of resources and costs. Complexity comes from the expanding number of program states to reason over; controlling complexity happens by isolating state and limiting the number of states a particular piece of code can see.