While incrementalism has a bad rap if you’re trying to revolutionize an industry, it’s an important characteristic if you’re trying to build a responsive application.
Most applications have a data model that gets transformed into some view model which is then used to draw the view that is presented to the user. So for example, Word has data structures that encode runs of styled text that act as the data model and these get formatted into a view model of lines and page layout structures to interact with the user.
When a user takes some action in an application, that action results in a change to the data model which then percolates through to the view and appears to the user in a small enough number of milliseconds to seem effectively instantaneous (when things are going well). Exactly how small that delay needs to be depends on the type of interaction (e.g. touch-based interfaces require very small delays in order to provide “stick-to-your-finger” feedback while keyboard interfaces can typically get by with somewhat larger delays).
As the size of the data model increases and the complexity of mapping that model into a view increases, a developer needs to get more sophisticated about the data structures and algorithms they use. If you have a list in your data model that you’re going to present in sorted order to the user, you might just re-sort the whole thing when it changes if it only ever has 10 or so elements. If it has 100,000 elements, you might need to consider something more sophisticated like a self-balancing red-black tree that can guarantee that the cost to insert and re-arrange is always bounded so modifying that list doesn’t stall your application.
In the early days of applications, the clever design of these data structures and the algorithms that manipulated them were what led to the magical experience of using these early graphical applications. These were machines that were thousands of times less capable than our current devices, so you needed to be really careful to consistently provide speedy interactive feedback. The applications were carefully designed so that small user actions (e.g. inserting a character in a line of text) only involved a small amount of work updating data structures and repainting the display so they could be completed with a small predictable delay.
The early browsers (circa 1996) had some features for incremental display and layout (e.g. they could paint an image scanline-by-scanline as it came chugging down that 56kb modem or could add lines to the end of a long galley view page). But except for a few of these special cases, they would generally just throw up their hands and regenerate the entire layout in a “batch” process. Netscape would actually re-parse the entire page from the original HTML just to layout to a new window width; they basically maintained no data model, just view data structures.
I had been working on graphical applications (word processors, spreadsheets, drawing programs) for 13 years by that time (late 1995) as I started working on our HTML editor, FrontPage. I was steeped in all these issues around designing applications for interactive performance. That’s why I was surprised by the lack of any support for interactive layout in those early browsers.
As new HTML features were added, they were added with little concern for interactive behavior. Features like floating tables and images were all designed with a focus on ease of use for the HTML writer and on the top-down “batch” layout behavior but with little to no concern over the performance implications of the specified layout behavior for an interactive application like an editor. A single typed character might require completely re-laying out and redrawing the entire page.
I was on the receiving end of all this because as the developer of an HTML editor, I needed to support the features the browser makers were adding. In the “traditional” scenario, an application designer can constrain the features themselves end-to-end by what can be effectively delivered as part of the interactive experience.
There are lots of examples of these constraints; Microsoft Word’s tables were designed with capabilities that were constrained by wanting to provide good interactive performance even in documents that had tables that could run to thousands of pages. Fixed column widths meant that typing in a single cell had only limited impact on other content. This contrasts with HTML tables where a single character might change the computed widths of every cell in the table and require laying out and redrawing all the content.
For HTML, we did a lot of work to analyze precisely what information could be cached to make incremental layout faster. For example, for table layout you only need to hold on to the minimum and maximum possible width of the cell contents in order to know their impact on the overall table layout. So you can throw away all the other detailed layout information (to save memory) and still be able to quickly respond to a change in another cell. There are a bunch of other places (e.g. with absolutely positioned content) where you can leverage knowledge of how layout propagates to other content in order to cache a limited amount of information and constrain the work you need to do when something changes.
As browsers added dynamic scripting capabilities, browser developers ended up needing to learn all these tricks that HTML editor writers were developing (and were stuck with all the same hard design issues that the early designers of HTML and CSS had left them with). Developers writing dynamic HTML pages also needed to be savvy about these performance issues in order to design pages that could redisplay quickly. Many of these issues have been washed out for all but the largest pages by another couple decades of hardware improvements.
As another example, I found it interesting when looking at the Kindle reader that they designed a set of layout constraints and algorithms that can run either forward or backward. They do this so that they can jump into the middle of a thousand page book without having to layout all the preceding pages. If you move to previous pages, it runs the layout process backward. You only see the effects when paging back and forth and getting slightly different page layouts, typically around large block elements like tables and images, or explicit page boundaries like chapters.
I was drawn down this memory lane when recently addressing an incremental processing issue in the redistricting application I get to hack on, DRA2020. I previously wrote about a very neat piece of work involving the problem of computing the union of a set of polygons. The polygons represent the precincts in an election district. We convert the set of polygons into a topology, where the polygons are now specified as a set of shared edges. Polygon union then resolves into the much simpler (and faster) problem of removing the shared edges; the remaining unshared edges represent the outline of the new merged polygon.
In the simplest case, we can do the expensive operation of computing the topology from our input polygons as a backend process and cache the result for download by the client application. This also helps because the topology is about half the size of the input polygons (since points representing all but the very outer edges are shared and only need to be stored once). The problem is that in certain cases, we need to combine separate topologies in the client application.
I had long had a background wish list item to try to come up with a faster solution to this problem that ideally did not involve fully unpacking and re-analyzing the entire polygon set. A common scenario where we needed this functionality was when “shattering” a single precinct shape to replace it with the constituent census block shapes so a user could assign individual census blocks to a district. It seemed crazy that we needed to run the entire expensive topology algorithm on all the shapes for the state (e.g. CA has 25,000 precinct shapes with over a million points) when only replacing one of the shapes.
I finally got around to writing a topology
The challenge in getting this code right reinforced my experience that incremental data structures and algorithms are often quite a bit more tricky than batch-style algorithms. You often need to be concerned with keeping additional information (beyond the final intended result) around in order to make the incremental computation fast. You also need to balance that with a desire to not bloat working set for computations that won’t actually happen. The type of updates your application makes can be important to analyze since you are not just looking at the form of the input data but also all the ways that that data can change.
In our case, we had the very isolated precinct shattering case, but also needed to support the case where we were combining the base precincts with possibly a thousand or so sub-precincts that had been divided into separate collections of blocks scattered around a state.
The application was nicely isolated from the topology splicing changes since it was already using a class that managed an aggregation of geographies and handled generating and caching the various representations we use internally. The performance of the operation improved by an order of magnitude (from 100’s to 10’s of milliseconds) and significantly improved the snappiness of the UI during operations when we needed to do this processing.
One of the challenges when using highly functional layers to build your application is these kinds of scenario-dependent optimizations can be hard to perform. Early application writers had the challenge of having to “roll-their-own” from close to the bare metal, but it meant there were lots of opportunities for end-to-end optimizations. Today’s application writers often have a much harder time figuring out how to optimize through some highly functional but opaque layer for their specific scenario. I’m certain developers struggling with optimizing chat prompts for a large language model API can resonate with that experience.
In our case, I had already forked the underlying topology library to add the efficient packed representation, so I had direct access to all the layers I needed to write the splice functionality. It was a fun piece of work.