Anatomy of a Moderately Complex App

54 min readJan 13, 2022

Where I go on and on about what I’ve been doing for the last 3 years

App is one of those dimensionless words like thing or object. “I built this cool app last night over a couple beers!” or “Microsoft is proud to release the 27th major version of the Microsoft Word app”, built with a 100 engineer team embedded in a 3000 person org to a billion users.

I tend to be an engineering and architect fanatic, but for that overnight app, creativity and insight is way more important than engineering or architecture. That overnight app is mostly composed of a preexisting stack of well-architected systems and libraries with just a little bit of local dev-written code. The amazing performance of the systems the app runs on will smooth over all but the most egregious performance errors. And if it has value but is really badly designed, rewriting it is often faster than fixing it.

But somewhere along the line from “overnight hack” to “aging monolith”, things get complicated and sticky. The issues that arise fairly quickly along the way end up being very similar to issues that you see with much larger apps.

I’ve been working on this app, Dave’s Redistricting App or “DRA”, for over 3 years now (!!??). Despite being retired and volunteering on this part-time for fun (and glory?), it’s gotten a little more complicated than an overnight hack. I thought it would be a good example to explore some of these recurring issues that arise as apps get complicated — as well as talk about where it is very different from more complex apps.

One of the surprises I experienced when joining a couple pre-existing startups and then Microsoft was how directly my experience building apps with low usage (but significant functionality) transferred to the problems and challenges of working on apps that had involved much larger aggregate investment and huge teams.

The level of depth (and length) of this post is a little crazy. I was recently reading a lament that there weren’t any good “stories” about decently complex apps to learn from. So I decided just to write the story and have it out there as an example of the “literature”. Feel free to skip around.

Getting Started

The first version of DRA, started around the 2010 census time frame, was built by Dave Bradlee, a Seattle-based software engineer. It was a classic scratch-an-itch project. Dave went looking for a free tool to explore redistricting and couldn’t find anything with the features he wanted. The tools the “professionals” used were all desktop apps that cost $1000’s of dollars a seat. So he built it himself, using Silverlight (Microsoft’s Flash competitor) and delivering the app through the browser. The tool developed a small cult following including some high-profile users (it was the main tool for the investigations behind 538’s magisterial opus The Atlas of Redistricting).

Circa late 2017, Dave was thinking forward to the 2020 redistricting cycle. Gerrymandering and redistricting were gaining in importance and visibility, and there continued to be a clear need for the software. Technically, Silverlight as an implementation strategy was obsolete as Microsoft abandoned it and browsers dropped support for it. In the intervening years, standard web technologies had advanced far enough that it made sense to build a new version of DRA as a “real” web app. The Silverlight app was delivered through the browser but was essentially a desktop app — it ran fully client-side and loaded and stored created maps on the user’s local machine. A real web app would be cloud-based, store content in a service to allow roaming and sharing and would only require standard browser technologies. I had left Microsoft a year earlier and after a couple decades managing massive software teams, I had returned to my roots and was programming for the joy of it. I was thinking about a larger project to sink my teeth into and a common friend, Mike Mathieu, introduced us (or re-introduced us since we knew each other slightly from Microsoft).

The project looked like a good fit. It served a clear social purpose of adding transparency to the redistricting process and putting tools and power into the hands of citizens. It also looked complex enough that it would be an interesting way to get my hands dirty with modern tools.

Dave was still working full-time, so at the beginning I mostly investigated technology choices on my own, coming up to speed with new languages, platforms and tools. I consulted often with Dave in gaining an understanding of the functionality of the old app and Dave worked to get a lot of the underlying source data (census and election result data) converted for the new app.

By June of 2018 we had the framework of a working app running in the AWS cloud. It allowed you to login, displayed a list of the redistricting plans you had created and allowed you to create and open a map, assign voting precincts to districts, and display basic statistics, the core functionality in creating a redistricting plan.

I now faced the challenge that every developer who has worked on a complex or moderately complex app faces. How to answer the question “what the hell have you been doing for the last N years?!?” You have a working app and, without changing the 30 second elevator pitch for what the app does, you continue working on it for years and years. I’ve experienced this with every app I’ve worked on, whether it was multimedia email, desktop publishing, real-time conferencing, HTML and web site editing or with the Office apps.

A very few people might be interested in the 3 different mapping layers you experimented with and migrated through over that time, or the 2 different database backends you switched between for various reasons (typically performance at a different design point or operational cost).

Interesting apps often have this characteristic — it’s easy to get something up and running but there is a depth of use cases and scenarios that generate a stream of requests and ideas for new features as well as motivation for re-thinking architectural and dependency decisions made early in the project.

If the project exists for any significant time, the underlying dependencies themselves evolve, generating a stream of work to update to the latest version or move off an interface or service that gets deprecated. The more dependencies you have, the bigger this stream of work. And the more code you have, the more work it is to track those dependency changes.

If your app builds an artifact, new use cases generate new size and performance requirements. A new performance point can require anything from small tweaks to a massive rewrite. In a service, growing usage inevitably results in new hot spots of service performance — more rewriting. Growing usage also inevitably leads to new scenarios and new requests for features.

We experienced all of these along the way.

Team

One thing that is very different from a larger app is that the team has been remarkably stable over the course of the project.

Dave Bradlee started coding “full-time” after retirement and has been responsible for most of our election and census data pipeline as well as much UI work, especially in the analytics area and in our various state-specific pages, in-app help, statistics displayed during mapping, etc. Dave continues to be the guy who has most direct empathy and understanding around what drives map builders — when we added a feature for local (vs state-level) redistricting, he immediately was experimenting with two or three possible Seattle city council district plans.

I do most of the backend work (server, lambda functions, command line utilities) as well as client “architecture” work and features dealing with the mapping surface itself. As well as other odds and ends (e.g. our groups feature).

Early on, we added our “scientist”, Alec Ramsay, who had worked with Dave a bit on election issues previously. Alec has a deep practical and theoretical interest in redistricting and had a lot of connections in the academic community. He was the key person that drove the unique analytic capabilities that we rolled out over the course of the project. These analytics are essentially an “amicus brief-in-a-box” since they encapsulate the key analytics that have traditionally required hiring some special master or academic specialist to pore over a specific proposed (or adopted) map to analyze for bias and other characteristics. Now it is possible to simply load the map into DRA and look directly at all of these analytic measures.

As the project was coming together, a wider group of us explored a more ambitious goal of directly effecting change in the redistricting process in order to get fairer maps adopted. We never came up with a breakthrough idea here. At the same time, the three core engineers on the project (Dave, Alec and I) felt certain that the app we were building would offer unique value to the redistricting community and help level the playing field to allow local advocates to offer specific and informed suggestions and critiques to whatever redistricting process was in place in their state. While we could not articulate the direct final impact, the three of us felt that the impact we could achieve was sufficient to warrant our continued participation and investment of our personal time. The fact that we are all over 60 and retired made it a little easier commitment for us and our families.

So after ballooning up for a bit, we shrunk back down to the core engineering team. We added a key additional member, David Rinn, who helps manage our partnership, outreach and training activities. Other folks that have contributed are listed in the About DRA page in the app. Mike Mathieu deserves special mention for getting the team rolling and funding the first year in the cloud.

Some Numbers

I want to provide some numbers just to give a sense of the size of the application as well as the size and complexity of the operational service.

We’re averaging about 3000 users a day, fluctuating with a predictable cycle over the course of the week. There are about 10,000 monthly unique users as of Fall 2021. That ratio gives some sense of the stickiness and addictiveness — the people who use our site, use it a lot. The highest activity users log in daily and spend hours on the site, creating thousands of maps. The user base has been growing gradually, with significant upticks when 2020 census shapes were released in March 2021 and then a bigger uptick when census data was finally released in August 2021 and all the real high-stakes redistricting activity took off.

Users created 150K maps on the service just in November. This includes maps that were effectively “hand-drawn” precinct-by-precinct as well as maps that were created by importing proposed or official maps from other sources. We support importing through either district shapes or block assignment files that are simple comma separated text files that assign census block IDs to districts. Importing is often done to use our analytics to explore a proposed map published from other sources. More on that below.

We have about 2 GB of database storage in Amazon’s DynamoDB and about 1 TB in Amazon’s S3 blob storage system (for static data files and data associated with user’s maps mostly).

On a random day in November that saw 3500 users connect to the service, they executed about 12M API calls on the service, resulting in about 500K database read requests and 250K database write requests (significant in-memory caching of database reads lowers per-API cost). The vast majority of those calls were either sending edit updates on a map or querying for map updates (long polling).

Our code (all on github) is split into basically 5 areas. There are shared libraries, command line utilities, server, browser client and AWS lambda functions.

The lines of code counts (virtually all Typescript) are below. As is typical, client code tends to dominate line counts.

Command line: 25K

Server: 11K

Client: 59K

Lambda: 10K

Libraries: 35K

Stack

The rise of browser apps saw a lot of focus on development groups sharing their “stacks” — the set of operating systems, languages and other system software used to develop their product (see StackShare!). Part of this was because there was so much churn and disruption going on that things were changing incredibly rapidly and it was super valuable to know what tools and platforms other products were using and being successful with. Additionally, it became so easy for a small garage shop working independently to build a significant product that there was a great demand for a community to be able to just learn from.

Things have settled down a bit, at least in the general “web app” world with a lot of great choices to pick from. But it’s still useful to understand what people are using. So here goes.

Almost all our code with the exception of some offline data pipeline utilities is written in TypeScript. Our backend uses NodeJS hosted on AWS Elastic Beanstalk. We also use AWS Lambda for “serverless” processing. Our storage uses DynamoDB for table-based data and S3 for blobs. The table storage is almost solely used as a key-value (document) store, with the exception of indexed queries that look up per-user and published collections. All indexed queries are only executed in our serverless lambda routines to limit variability, load and latency on our front ends.

Our front-end uses React and Material UI (which seems to have been rebranded “MUI” while I wasn’t looking) for most UI and Mapbox for the critical mapping layer, along with the typical collection of assorted utility libraries. Of special note are the TopoJSON libraries mostly written by Mike Bostock that I wrote about before.

We use webpack for JavaScript bundling on both the front and backend. And of course npm for package management.

I would say that one point about building this kind of web app vs a classic desktop app is that for a desktop app, you basically look at overall performance, memory working set, etc., compare to some “similar” app and you can get an idea if you’re a pig or not.

For a web app, you can do this on the browser side, but on the server side, the only measure of “reasonableness” you have (assuming you’re not in a large shop and don’t have access to data for a range of apps) is whether you’re spending too much money. And what’s “too much” is also up in the air. In a lot of cases “money” and “performance” go hand-in-hand like when you reduce database access costs by doing more in-memory caching. But it can definitely feel like you’re operating in the dark.

Architecture

Server

The server architecture is straight-forward with a few slight twists to handle the real-time editing requirements. Multiple front-ends (3 currently) communicate through DynamoDB and S3. The one twist is that in order to efficiently serialize and synchronize all users editing a single map (to support real-time editing/sharing) we have one additional server that just serves as a message broker for the other front ends. Any client requests that need to operate on a map in memory get forwarded to the message broker. The front-ends then (randomly) retrieve these requests from the message broker to do the actual processing. The “tricky bit” is that once the message broker hands a given front-end a message associated with a particular map, it will continue passing messages for that map to the same front-end (subject to a timeout window for robustness, etc.). This allows one front-end instance to do the heavy work of loading the map blob from S3, serializing editing requests (using Operational Transformation to handle collaborative editing), caching it in memory while editing is active and saving back the blob once editing completes. We save the blob back at some regular interval to prevent data loss if the server crashes, tuned to balance reliability with the performance and operational costs involved in saving the blobs. We also compress the blobs in memory as they go quiescent to save memory but before we want to unload.

In practice, server crashes have been solely due to our own software bugs and these essentially disappeared over the last year as the rate of change on the server was reduced and we got more defensive about bugs caused by dereferencing null values (the majority of the crashes). The bad stability bugs that did occur (single digit in number over the course of the project) were typically not the result of rolling out major new features but rather “simple” bug fixes that were made quickly and then poorly validated prior to deployment. When I was coding as a development manager, I typically released my own code with a “Caesar’s wife” paranoia to keep from losing my moral authority around enforcing development practices, but now I’ve devolved back to my cowboy roots.

This is also a good example of where you can put up with lack of discipline when you have a low rate of change / small number of developers. If those 10 bad bugs where multiplied by 30 developers, we would be in regular crisis mode. As it is, we were able to deal with the rare stability issues in a fairly ad-hoc way. Additionally, it helps that the really high-pressure uses of the app (where a person or group is scrambling to use the app to analyze and provide feedback to a redistricting commission or court operating on a tight deadline) have only happened recently when the rate of change on the server is low.

I would say that in a non-trivial number of these serious bugs, there was some nagging anomaly that I saw during development that I failed to track down to root cause before releasing whatever change I was working on. I have this Platonic model of the superb programmer who never lets those kinds of unexplained behaviors go uninvestigated and unresolved, but I typically fail to live up to that ideal and then suffer the consequences. I definitely get antsy to get my code in production and there is no independent gatekeeper to prevent my worst impulses.

In order to minimize long-running and expensive operations on the front-ends, any expensive operations are performed by serverless routines in AWS Lambda. This includes things like computing and serializing a user’s map metadata (the information used to display their list of maps) to a cache blob in S3 that the client then directly downloads.

A common technique other apps use for limiting the costs of expensive queries is “paging”, where the query only returns some limited number of results (e.g. 100) and then the user has to take some action (like scrolling down through the list) to force a request to the server to fetch the next set of results. We have a number of features (e.g. an Outlook-like search box over the list of maps that allows you to quickly filter a potentially large list of maps, as well as some of the highest value analytic features) that really require having the full list available at the client. I remember struggling with this client/server trade-off in the email client I was working on in 1993 (and actually discussed the design challenges in an “interview” loop with the head of Office when I joined Microsoft).

By the way, if you ever wondered why your “unread messages” count is wrong in your email app, this is the same problem in a different guise. Your local app only has some of the messages cached so needs to manage this total count across the API boundary with the server, even as the status of individual messages is getting updated back and forth across the client/server boundary. It gets complicated.

The requirement to see the whole map list is an example of how features over time start constraining other technical choices, building complexity and constraints on future development. For now the cached blob approach is working well for us since even the full list of metadata for all published maps (20,000 as of Dec 2021) is only 6MB compressed and the cost of loading that is all off loaded from our front-end servers.

Initializing map state can also be an expensive operation, so this is also offloaded, with the client fetching a blob from S3 that initializes the map data structure and thereafter the front-ends only need to pass along incremental changes.

This is all “obvious” although actually involved multiple iterations and rounds of improvement as we typically started with the front-ends doing some amount of the heavy lifting before “getting religion” around how important it was to eliminate those variable cost operations that might involve long computations and large result payloads from our front ends. In most cases this work was not driven by a specific clear performance issue but just recognizing that each step we took made server performance and request latency more and more predictable. It had the feel of a lot of performance work (like chipping away at memory use, or making changes that improve processor cache locality) where you don’t get a big bang for buck for each incremental change but the overall effect is large.

Each of these improvements contribute to that “moderately complex” architecture since they typically involve some additional support in the form of a serverless Lambda routine, perhaps an additional bucket in S3 and then a slightly different pattern of request behavior in the client — typically interacting across an API with the server and then directly fetching the blob from S3. Even simple performance improvements typically involve some kind of specialization rather than generalization so are a frequent source of growing complexity in any app. It is the rare, but exciting performance improvement that involves simplification rather than growing complexity. This was definitely true in our case.

Data Stuff

So a savvy reader hearing about using various DynamoDB tables and S3 blob storage might wonder about how we deal with issues around transactional integrity. That is, if some user operation involves some combination of multiple table updates and blob updates, how do we ensure that the overall system maintains consistency in the face of failures — or potentially conflicting updates? In classic database technology, you define an atomic transaction around a set of table reads and updates and then the transaction either atomically succeeds or fails, in either case leaving the database in a consistent state.

If we want to get fancy, we would say that we enforce consistency at the app layer. More properly, you might instead say we are prepared for inconsistency at the app layer when some operation requires consistency between data stores and updates are not transacted and therefore might actually be inconsistent. The simplest case is that we save the initial metadata for a map on creation but only save the blob after subsequent initialization. So the code that loads blobs cleanly deals with a failure to load the blob and reports an error back to the user.

A more extended example where we expect inconsistency (actually intentionally introduce it) is below.

The user record includes a list of sharing GUIDS (globally unique identifiers) for any shared maps that a user has opened. This is used to populate the “Shared With Me” map list in the client. The GUID is the ID for a database record that then specifies the GUID of the actual map. The map metadata specifies what operations (view or edit) are allowed on the map for that access GUID. If a user “revokes sharing”, the sharing GUID record is deleted from the sharing table, but no effort is made to make the overall system completely consistent by removing those entries from a users “shared with me” list. We lazily update this list — if the user tries to open the map using this deleted GUID, we inform them that the map is no longer available and fix up their list at that point.

In practice, the vast majority of consistency issues are due to software bugs, not hardware failures (in our app and in most systems) so having robust handling of these kinds of consistency failures generally makes sense anyway. This has the feel of the “end-to-end argument”. Rather than expending lots of effort ensuring that every individual change results in total consistency, you take some performance shortcuts and deal with inconsistency at the “ends”. This is a trade-off of course because taken to extreme, the code at the “ends” gets impossibly baroque. Caveat developer. That’s why it’s called an “argument”.

The other characteristic of this rather simple data design (e.g. no cross-table queries with complex joins) is that the expected performance of any particular operation is well understood by the application developer (me). This is true for most systems built on document (key-value) databases (except to the extent they then start building up complex “joins” in the app code themselves, which is a problematic design pattern). I have seen some complex SQL-based applications grow with more and more complex queries to the point where developers start tearing their hair out trying to figure out what the actual data access pattern is (or more typically how to fix a performance problem caused by a pathological data access pattern). I am in no way a database design guy, but find this model where you understand the data access implications of all your features as you build them up much more like the typical desktop application design that I have practiced for most of my career.

Client

The client is a “vanilla” React app in general structure. This means that the core architecture has a global application state (“props”) that is rendered to the HTML surface in a top down pass. User actions change the global application state and then initiate another top-down rendering pass. The React engine handles the hard work of ensuring that the minimal amount of actual re-rendering happens at the browser surface to reflect changes so that the UI stays responsive by re-rendering quickly.

Besides user actions that directly change application state, the other source of state changes is responses to API requests on the service. These are handled symmetrically with user-generated changes in application state. So, basically, an API returns a result that is incorporated into the application state and the content is re-rendered. For example, if a request is made to “duplicate” a map, the response will include a success or failure indication as well as an update to the map metadata list. This causes a rendering pass to run and when the table of maps is re-rendered, the new duplicated map will now show up in the list.

There are a number of these local metadata caches maintained by the client (user profile properties, list of user maps, list of published maps, list of groups, mapping of user ID to user names, list of custom overlays). Any API response may return an update to these caches which is then incorporated into the local state and used to render the content. The service ensures that when cache updates reference some other entity (e.g. map metadata with a user ID), the cache entry for that referenced record is also included. That sounds like a join, doesn’t it? It is limited to a few simple cases (basically user names) and the servers maintain a cache of these mappings to ensure it is quick.

I first saw this approach of allowing any command to return status updates in the IMAP protocol and have since adopted it in other places. It’s an effective way of modeling how a client stays current on remote state without having to specify custom return payloads for every API.

As is typical, the complications around state management are in the details. There are two sources of complication, mostly related to optimizations for performance.

The “application state” that is stored to stable storage and loaded when the application starts or when a map is opened is effectively the minimal representation necessary to recreate the map artifact in the user experience. To actually render the map and associated statistics and analytics, the users map data needs to be combined with the static census and election data that is shared by any map for a particular state (and census cycle). So the user’s map might record “precinct 0600312345 is assigned to district 4”. The application needs to actually load up all the data for “precinct 0600312345” to be able to present in the user experience that “district 4 has been assigned 4856 people with this ethnic and partisan breakdown”. So the process of loading a map involves loading the map artifact itself and then loading all the static data referenced by the map and then combining them into a form that can be rendered in the UI.

This is a very common pattern — e.g. a word processor needs to load all kinds of static shared font information in order to be able to measure the text in a document and lay it out into lines, paragraphs and pages. A weather app might have a per-user list of cities but the data about the actual weather is common and shared between different users.

The overall application metadata, the data for a particular map that the application is opening and all this static data is arriving asynchronously. To simplify the overall design of the application, we always treat any change uniformly as “update application state, then re-render” whenever any change comes in. This happens whether it was from a local user action, an API return result or a static data file that finished loading.

An additional level of complexity comes from the fact that we don’t just take the application state and directly present it in the UI. We need to compute a variety of additional information like the aggregate statistics for each district (by doing a summation across all the properties of the precincts that are assigned to each district) and the aggregate outline for each district, as well as more advanced analytics around bias, splitting and compactness that we present in the UI.

Again, this is a very common issue that arises in application design. In the classic “model-view” architecture for graphical applications, this is the messy grey area between the model and the view. In a word-processor, for example, this is all the calculated information about exactly how the lines, paragraphs and pages are laid out.

The simple approach is whenever there is any change, you just recompute all this derived information. For many applications, even surprisingly complex ones, this approach works just fine given the processing speed available these days. Most of the challenge involved in making your app feel zippy and responsive is actually painting the screen and you allow the React framework to handle the hard problem of optimizing that.

In our case, the data we need to walk over is large enough that we would rather not just recompute everything from scratch, especially if the change (e.g. a metadata update to a map that is not even currently visible) has little to no effect on the visual rendering on screen. For example, big states like California or Texas have 10’s of thousands of precincts to iterate over and aggregate approximately 100 data points per precinct.

So now we have to deal with the issue of incremental refresh. For some applications, this is where almost all the cleverness is. In the FrontPage HTML editor, I spent a lot of time on optimizing how to efficiently recompute the layout of tables and other complex content as the document is edited. The trick is in both trying to minimize the amount of work you do to layout the new content while also ensuring that you recompute enough to get the same final result as a completely new top-down pass would produce. You also need to decide how much information to keep around (cache) from the initial pass in order to make the incremental pass quick but not bloat your working set. I always found it interesting to see some layout error in Firefox or IE and try to root cause where their incremental layout algorithm had failed to deal with some edge case. Floating objects were notorious here since the layout issues are especially gnarly. HTML layout was mostly designed with a concern about specifying the results of a complete top-down layout pass and incremental layout was “an exercise left for the reader” (or browser implementor). This contrasts with most editing applications which usually have to think about incremental layout right from the start since interactive editing is the core design point.

The early Netscape Navigator implementation for tables involved multiple passes to compute the minimum, maximum and actual layout for each cell in order to layout the table based on actual cell contents, recursively for nested tables. They not only didn’t cache any information, they recomputed each stage by going all the way back from parsing the HTML to laying out the content in one monolithic pass. The ultimate result was that the performance of layout was exponential in the depth of table nesting. This was horrific for both top-down and incremental layout and a significant challenge for early web designers. When I first saw this I thought how antithetical it was to the typical architecture for graphical applications. Those college kids!

But I digress. In our case, we basically have a data flow diagram where some set of data sources flow through to result in a set of computed results, and in some cases the computed result then flows through to another computation (e.g. the precinct shapes combining with the precinct district assignments to create a computed district outline which in turn feeds in to an analytics metric of district compactness). We use a rather eclectic collection of serial change stamps, computed hashes and exact object equivalence to determine whether there was a change to inputs that should force a recomputation.

In retrospect, I should have been a bit more rigorous about putting a standard mechanism in place. These computations produce results that also “arrive asynchronously” from the perspective of the overall system. Essentially, the computations might not be complete on any particular rendering pass and the rendering code is prepared to deal with their absence. In some cases the computation is not complete because the work to compute it has been “sliced” into small chunks to keep the UI responsive and in other cases it is because it depends on some data file having completed download and it hasn’t arrived yet.

This is actually a good example of how things get messy in that “moderately complex” app. You start out and there’s only a little computed information (e.g. the aggregate statistics for each district). So it’s easy to code a custom check for whether the inputs have changed. And then you add some more derived information and you add another conditional check. And then another developer comes along and just extends what you were doing in a pretty straight-forward way, although maybe with a little tweak since they didn’t fully understand all the subtleties of the “model” because it was really just in your head and not explicit in the code. And then you find that you have a performance problem because you are computing too often (e.g. why recompute the aggregate statistics if only the metadata about the map has changed?) so you add some additional code, and maybe a bit of state or two, to optimize that. Pretty soon you have a complex structure that’s hard to reason about and leads to both performance bugs (too much being recomputed too often) and functionality bugs (things not being recomputed when they really should have). And it is hard or scary to extend, so the app starts getting hard to change.

This can look like “technical debt” and if you’re disciplined, you look for where these hot spots are developing and you throw some time and effort into simplifying them. We get to scratch these issues fairly regularly, partly because we don’t have anyone breathing down our neck about how we are prioritizing our time! And partly because cleaning up some complicated area with a cool/pretty/smart design is a good source of the endorphins or pride of why we are doing this in the first place.

In fact, as I was writing this I had been working through a new feature (analytics around how cities are split by a redistricting plan) and had a new data dependency issue to work through. So I finally decided to scratch that itch and clean up the whole area.

This lets me talk about two issues, the zen of refactoring and risk/reward calculations.

Let’s talk about risk/reward first. As a manager/exec, I looked for careful analysis and argument around redesigning working code. As a developer, I did some crazy shit. When working on FrontPage 98, I basically rewrote the whole HTML editing surface after Beta 2 (after our last chance to get extensive user testing prior to release), checked it in to the main release code tree and then went on a 3 week vacation. It all ended well (it was a big improvement). For DRA, I’ve been pretty free to invest where necessary to clean things up. Often, there will be “one small feature“ that basically pushes me over the edge. So that feature gets checked in, but maybe with a surprisingly large changelist.

In Office, we used to do a lot of “architecture“ work, off-schedule, in the time between closing down one release and starting the next. The test org was focused on final verification, the program managers were planning and researching the next release and the devs could “scratch an itch” to clean up some area.

This was problematic. In practice it often meant that we threw away all the careful prioritization and tracking that we exercised with the regular feature list. It also meant that the overall system went from the highest quality and performance we could achieve just at shipping and then had already devolved to lower performance and stability before we had even entered a coding milestone. We put a lot more careful process in place over time, but the biggest change was just shifting to a continuous shipping model (long discussion here). Having to ship is the biggest forcing function.

On DRA, I probably have been aggressive about refactoring from a code pride perspective more than anything else. Part of the reason why I’m doing this in the first place is just to continue getting practical experience in dealing with design issues in complex systems. And those are most interesting when you actually think you’ve handled it well rather than just bulled through the bugs with brute force. So if some area “smells”, I typically feel a need to go in and clean it up.

On refactoring, I’m using that word loosely. The classic refactoring process is about making small behavior-preserving changes to code that improve its overall structure. That process can be highly useful, especially when you’re teasing apart two or more components that share too much information and by pulling them apart you’re able to understand (and control) their actual interactions much more clearly.

The change around dataflow dependencies was much more like the kinds of larger changes I’m talking about, which are partially about having a better way of thinking about and structuring the code without actually changing a ton about what it is actually computing. So here, I initially designed a basic DataFlow object and then starting going through a process of converting the main dependency logic to use this model. I’m pretty comfortable with throwing things up in the air (when I was rewriting the FrontPage HTML editor’s selection model, I went three weeks of major code refactoring without even compiling). But your mileage can vary on that approach — it requires being comfortable having a lot of balls in the air.

In this case, we did prop that test server and Dave and Alec found several critical bugs in my changes. When we deployed, there was still an important bug that only showed up in a certain interleaving of asynchronous events so we hadn’t seen it in testing. Several users reported it and I was able to diagnose and fix (by inspection rather than debugging since it was very difficult to reproduce).

Multistep Operations

The vast majority of the asynchronous complexity of the client is just handled by this model of “update application state and render”. In a surprisingly small number of cases, there is actually a multi-stage editing operation that needs to be managed asynchronously. For example, when we “paint” a city (assign all the precincts in a city to a specific district), we may have to fetch some additional data files that specify the city boundaries at the census block level before being able to assign those census blocks to the district in the map’s data structure.

We use a structured Finite State Machine class that I built early on in the project, partly as an experiment in designs for doing rigorous asynchronous state management in a composable way. This is mostly used on the server side but we also make use of it a bit in the client.

These “FSMs” run independently and asynchronously. They get kicked when the operation(s) they are waiting on complete and then move to the next stage in the process. Since they are objects themselves, they can be managed and tracked like any other application state. The “trick” is the same general problem you have with any asynchronous design — being careful that your other code know what data they are holding on to and making sure the FSM is ready to deal with “the world has changed out from under you” when it gets to run. Where they are used for editing, they inherently are less isolated than you typically like for asynchronous processing since they are directly impacting the map the user is working on. So we’re careful with the few places we use this.

District Shapes

I wrote a bit about computing district shapes in Small Pleasures of Programming. The district outline is drawn on the map surface as well as used in the process of computing analytic metrics. As the user adds precincts to a district (the core activity in “coloring” a map), the district outline needs to be recomputed. Initially, we were using a library that handles computing the union of arbitrary polygons. This is expensive enough that it could potentially block the app UI if run synchronously (e.g. it could take over 4 seconds to compute all the district boundaries for a map of Texas). I initially implemented a relatively complicated “work slicing” mechanism that broke up the work into small chunks in order to leave the app responsive while the computation was happening. This kind of work slicing happens in lots of apps (e.g. Word uses this type of mechanism when laying out pages or Excel when recomputing a spreadsheet since both operations can take a long time). In our case we would compare the old map assignments with the new ones to be recomputed, determine which district(s) had changed (since in the most common case of assigning a precinct to a district, only one district changed), and then recompute those district boundaries (and in some cases optimizing by using the previous result and simply adding or removing the boundaries of the one or two precincts that had been added or removed).

When we switched to using topographic join rather than polygon union, the aggregate outline could be recomputed much more quickly, so we were able to get rid of a level of the work slicing (at the single district union level) while keeping the overall structure of how this was incrementally (and asynchronously) recomputed.

I had put quite a bit of effort into this area because district shape computation is in the direct feedback path during the main activity of coloring a map. While there are always a lot of different areas to focus on with performance work, anything in the main UI responsiveness path for an app always gets a lot of focus. There were periods during development of the app where the core activity of painting a map was definitely sluggish before I got all this worked out. This was personally embarrassing since I claim to be a bit of an expert on this stuff. Embarrassment and pride are definitely useful as motivation to improve things.

You’ve heard remarkably little about the Operational Transform protocol that is at the heart of how the client and service communicate map edits and that enables the real-time co-editing capabilities of the app. This is because it does precisely what a clever piece of technology should do — it hides the internal complexity! In practice, because most editing involves a single person editing their map from a single device, the most important features of the protocol are the combination of transparent local state update and auto-batching.

Local state update means that a local change to the map state can be applied immediately and then directly reflected in the local UI. So UI responsiveness is not gated by network latency. The server response that arrives later either simply validates the edit or combines that edit with other remote user edits. If there are other edits included, these are then just reflected in the UI as is any local edit.

Auto-batching means that as the app waits for server acknowledgement for its first edit, subsequent edits are automatically batched together into a single change. This prevents communication hiccups from resulting in a longer and longer queue. Auto-batching is a really nice feature of any protocol between asynchronous agents as it provides an automatic way of regulating the overall latency as well as the overall processing cost of a series of requests. That single batched request ends up with essentially the same latency and cost as a single request rather than the product of all the individual requests.

The combination of features makes the app robust against server (really mostly connectivity) hiccups although we leverage this to make server deployments transparent.

But what about the issues I talked about in Real Time Editing is Kind of Hard? In that post I described why many developers trying to use the co-editing libraries that had been provided by cloud service providers ran into trouble with real-time editing. The key problem is that the OT merge algorithm happens at a level below how the data schema of the application encodes semantics (merge happening at the level of simple arrays and property maps rather than a “redistricting plan”). This can result in merges that are “correct” from an OT perspective, but leave the application in a semantically inconsistent state. You really need to understand the constraints of OT as you are designing your application’s data model. And fortunately I do!

In our case, the main data structure, where most co-editing happens, is the property map that assigns precincts to a specific district. And property maps essentially work “perfectly” at the OT level. Many other data structures are also simple property maps with atomic semantics (e.g. to turn on and off the display of county lines or labels on the map). The few places where there are more complex data interdependencies in the model (e.g. the total number of districts you are creating) are rarely changed after the map is created.

Additionally, the client application has code to rationalize certain illegal semantic states (which was mostly introduced to clean up after bugs rather than deal with OT merge issues).

That’s an approach I’ve taken in other applications since often its easier to manage one piece of code that enforces elements of semantic consistency than to guarantee that 10’s or 100’s of places where the state is modified don’t make an invalid transformation. Your mileage can vary on that approach, but I think it’s generally under-used. It has the flavor of the “end-to-end argument” — you enforce the semantics at the “end”, in one place, rather than scattering it throughout the system.

Map Layer

The map layer is the heart and soul of the application and is where users spend most of their time creating maps (when they are not poring over analytics). We started using OpenLayers, then switched to Leaflet, and finally settled on Mapbox. That was a non-trivial decision because the first two are open source, free solutions while Mapbox is a commercial offering with non-trivial ongoing cost based on usage. Ultimately, the decision was based on performance, especially around our specific use scenarios which involved overlaying a base map (supplied by the service) with a large collection of geographic features that we provided and being able to efficient update the properties of that collection (e.g. to change the fill color of a precinct to reflect it has been assigned to a district) as well as update parts of the collection itself (e.g. the district outline changing as each precinct is added in).

Map Surface with Statistics and Control Panels

Another reason was that the Mapbox map surface simply felt higher quality. It uses vector graphics rather than pre-rendered bitmaps so scales more smoothly at intermediate zoom levels. The maps themselves also have a higher design quality. Additionally, they’ve just done a better job of damping down the interactive feel of zooming in and out. Both OpenLayers and Leaflet are much more jerky here, a problem that gets brought up often on their various forums and bug lists but never seems to get fixed. Not sure why since it does seem like one of those things that might take some insight on approach but could be well isolated from everything else. I was tempted to go off on a chase through the sources just out of curiosity but never did.

In our interaction with the map layer, we took an approach that feels somewhat like React’s HTML tree differencing algorithm. For any given state of the application, there are a set of collections of features and visual properties on those features that should be displayed on the map. For example, if the user requests “County Lines” to be displayed, then that collection needs to be added to the app. The map API itself is procedural in nature — you call APIs to add or remove collections and set their properties.

So our rendering pass walks through our map data model and determines which collections should be displayed. We then walk through a differencing resolution process that compares the collections that should be displayed against the collections that currently are displayed. It adds or removes collections to bring the two collection sets into sync and then remembers what it did for the next differencing pass. This greatly simplifies the process of adding product features that impact the map surface.

There was actually a lot of iteration to get to where we ended up since there are often different ways of achieving the same visual effect (e.g. whether you update an entire collection or just change properties of features in the collection). This was definitely a case where I felt we were wrestling with the API a bit — it was not really designed for the kind of incremental updating we were doing and so it wasn’t fully possible to optimize all the way end to end. That is, we create a structure inside the map API and then our app, in editing, needs to make a small change to that structure but we can only update it by bulk replacement. In this case, we were able to get by because the performance was sufficient and any bad visual effects (flashing) were minimal. Such is often the challenge using multifunctional APIs.

Version History

I thought I’d talk about the version history feature because it serves as a such radical contrast in complexity and development cost to the equivalent feature in a complex app like Office.

Version history is the feature that lets you restore a version from “last Tuesday at 3pm”. Basically, it lets you look at the edit history of a map, maybe tag a particular version with a label (rather than having to copy it to a new location) and then pick some previous version and restore the map to that state.

It’s not the most important feature in the world. It can be a lifesaver if you unintentionally changed something, but it’s only used moderately often. For example, for DRA, it was used 1500 times in November 2021 (and 220,000 maps were edited that month). So about 50 times a day.

Let’s talk about Office first. Office doesn’t own its own storage. Of course originally it just used the PC/Mac file system. Over time, network file systems were supported “transparently” and eventually the whole open/save pathway became a key point of extensibility in both Windows and Office, supporting not only those networked file systems but also other third-party document management systems and Microsoft systems like SharePoint. Even the Exchange mail system at one point had a way of saving Office documents directly to an Exchange server (“strategy”).

This extensibility was a win-win for Office and those other systems — they got access to all those Office users and Office became more tied into the overall ecosystem.

The challenge with these complex points of integration is that it is their complexity that makes them “sticky” (helpful for maintaining market power) but it is also their complexity that makes them so difficult to innovate through.

Office would spend multiple years-long product cycles, with 50+ person teams and complex inter-organizational collaborations with Windows and SharePoint in order to develop its version history features.

In radical contrast, DRA owns its own storage (basically just a blob in AWS’s S3 blob storage system, with some additional metadata). When I decided we should support version history (like any respectable app), I designed it in an hour or two and had it working in a day.

When the server opens a map for editing, it checks whether it has saved a copy of that version. If it has not, before making any changes it just saves an entire copy of the map to another blob and adds an entry to the map metadata pointing to that blob. We don’t worry about any complex diffing strategy or anything like that — “write only” storage (which is what most of these version blobs are) is very cheap. Most blob storage costs are associated with actually reading the data.

As is typical, most of the work is on the client side in the UI, using that metadata to display a version history. If the user asks to “restore” a version, an API call to the server has it copy that saved blob into the current map state. We could have gotten fancier with showing differences between the versions or such, but this addresses 90% of the requirements so we have just left it like this since the day I first implemented it.

The aggregate difference in complexity and cost is probably 5 or 6 orders of magnitude. For a feature that appears to the user as relatively equivalent in functionality.

Block Editing

One of the most challenging features to add was support for block-level editing. A user “painting” a map might start at the county level to assign large populations but then quickly drops down to the “precinct” level, also known as a voting tabulation district or VTD by the census. One can create a legally valid plan by only assigning at the precinct level, but most official plans end up also creating districts by assigning regions at the census block granularity, the smallest level of geographic unit the census uses to divide up the country. Most states have standardized on stopping there and not going below census blocks (at the congressional level, this is the only granularity where they are using the mandated official census data to determine actual population assigned to the district).

So to support analyzing existing and creating proposed official maps, we needed to support block-level editing. We had first rolled out only precinct-level editing, for performance reasons. A state like Texas has over 20,000 precincts but almost 600,000 blocks (actually, in 2010 they had almost a million blocks). Each of those are shapes with possibly 100’s of points specifying the shape outline and then 100 or so numeric properties. While bulky, we could load up the entire circa 100MB precinct-level shape file, but there was no way we could load up all the gigabyte-scale block information.

That we can batch the precinct information is actually pretty incredible and is a good example where something that would have required a lot of mechanism and complexity when machines and communication networks were less capable becomes “just load up the file”. That performance crossover is especially interesting where your machine capability interacts with something about the real world — in this case the size of the dataset used to describe a large state’s precinct-level divisions.

But at the block level, we were back to an overall data size that required more sophistication. Maybe we can also simplify that in 2030…

The normal way a mapping app handles this general problem is the same way any app tries to “virtualize” the ability to navigate a large data space. You take advantage of the fact that the user can only look at a certain amount of information at a time. Ultimately you’re leveraging the limits of human cognition and perception. A zoomed out view is compressed by dropping out lots of features that are not visible at that resolution. As the user zooms in to a detailed spot on the map, you bring in higher fidelity portions of the map (“tiles”), but you only have to bring in part of it because you’re only looking at a single, smaller section of the map. You’re leveraging the fact that the user is limited by their screen size and the ability of the human visual system. Virtually any sophisticated app has some critical part of their architecture where they use this approach to deliver the performance required.

The problem we run into is that messy “moderately complex” issue; we have other features to worry about besides just displaying the map.

Specifically, we need to aggregate the various census and election data per district across the entire map, even if you’re only looking at a small part of it. Our analytics features also need access to the entire map, for example to do “splitting” analysis to see how counties and cities are split by a new redistricting plan. Our original model was that the user’s map data is pretty minimal — just the assignments of precincts to districts (plus some metadata). All the interesting display and analytics then comes by combining this with all the geographic, census and election data dynamically in the app.

We considered a model where the aggregated data gets incrementally computed as the user is editing a map and then gets cached with the map. Any edits are always done on the visible section of the map where the data is available to incrementally update the cache. We ultimately decided this seemed too fragile as well as not supporting our model where the underlying datasets can get updated along the way (e.g. adding election results as they become available). The multi-user editing functionality (more moderately complex feature interaction!) also makes caching complicated — who does the caching? In fact, it is theoretically possible that multiple client edits are successfully committed and combined at the server and never seen by any client which would have to push the updated cache information. So it gets pretty messy as well as resulting in more server load and traffic.

So we needed another approach. We actually iterated through a few stages, as is typical when you’re shipping along the way. Alec came up with the basic idea of using virtual features that represent pre-aggregated subsets of the census blocks that make up a precinct. Most official maps only split a fraction of the overall precincts in a state. This goes from a low of zero split precincts for some states (typically where there is actually a legal requirement to avoid that) to other more heavily gerrymandered states where 1000 or so precincts (10% of the total) are split (e.g. Illinois lower house map). Those 1000 split precincts are almost always only split in two. So if the data for those splits were pre-aggregated, the client would only need to load about 20% more data, in this worse case. And all the client algorithms that operate over precincts would not have to be changed since the total number of “precincts” would be about the same.

We initially only supported this when importing a block assignment file (see below). The import process would analyze any precincts split between districts and then create a database record describing the list of blocks in the split and run an aggregation process across the block-level information that would create and cache the data for that split in S3. The clients “feature to district” mapping table would include entries for these “virtual features”. When the app saw a virtual feature, it would download this aggregated data and combine the feature into the set of whole precinct features at the mapping layer.

We released this as part of a major upgrade to our map analytics support and imported all existing state maps so our analytics could report on them. Individual users could also use this import capability but could not do interactive editing, which limited its overall usage.

The next step was to actually support interactive editing. This gave me a little heartburn. This is also not too unusual. There are some parts of your app design where it is just a thing of beauty, clean and scalable. There are other parts that work, but can push hard on the limits of the design. In a word-processing app I wrote a long time ago, characters were cheap but there was non-trivial overhead at the paragraph level. This was generally fine for “word-processing” style documents with a high character-to-paragraph ratio, but plain text documents (e.g. like a code file) treated each line as a separate paragraph, including empty lines. So each of those empty lines, instead of being a cheap character was an expensive paragraph. Ugh.

In our case, the problem with this design was that the number of possible splits is hyper-astronomical (essentially N factorial for each precinct where N is the number of blocks in a precinct, between 10 and 1000). Geometrically this is limited because splits are usually contiguous, but it was still bad. Additionally, the aggregates, since they contained the election data that gets periodically updated, would also need to be periodically regenerated.

The user model we exposed was that a user would click on a precinct to “shatter” it. We would transparently download the precinct-level block file and integrate that into the map display. They could then assign individual blocks to a district as with any other precinct. When they moved away from that precinct, we would automatically “commit” the effective splits to the server, where the server would create the database record describing the split and start the aggregation process. The ID for the record was a hash based on the set of blocks, so could be computed and stored in the map even before the server finished that commit process. It also meant the same split could be shared by multiple maps — sort of content based addressing. (And yes, storing the ID in the map before getting a commit acknowledgement from the server could result in inconsistency in the face of failure that the code needed to be prepared for.)

Block Editing with a “Shattered” Precinct Outlined

This was relatively straight-forward and is still the UI model we expose (and which seems to have been well accepted by our users). The awkwardness is that you have to explicitly “shatter” a precinct before assigning blocks, but the advantage (over, e.g., exposing blocks for all precincts when going to a particular zoom level) is that the user does not unintentionally split precincts, which is generally frowned upon if it can be avoided (in order to keep political units together in the final map).

Over the year or so we worked this way, users created more and more (millions) of splits. Essentially every precinct in every state that wasn’t zero population got split, some in hundreds of different ways. Ultimately I was uncomfortable enough with the long-term implications that I did more analysis on how we could speed up downloading precinct-level block information. I decided we would be better off with a model where the aggregated “virtual precinct” is computed locally at the client by downloading the precinct block information for any split precincts. We also needed to switch to a model where the virtual precinct ID moved from a reference to a database record in the service to an ID that directly encoded the list of blocks (as a base64-encoded bitset of the sorted list of blocks for that precinct). This meant the client could go directly from the precinct block data and the bitset to the aggregated information. We were already doing that computation in the code that did block-editing in the client for “ephemeral” virtual features.

We swapped this all out under the covers without a change in user model. The client would automatically convert an existing map from one format to the other when the map was opened for the first time. We also had a batch process that we could run on the back end, but we did not bother running on existing maps, deciding to let that happen on-demand as the maps were opened.

This was definitely one of those “gulp” moments when we deployed since you now have code that does a pretty significant edit on the user’s map automatically, just on open. There was a glitch or two but nothing catastrophic. It was the kind of feature upgrade that probably would have required an 18-month roll-out process with multiple exec reviews back in Office.

Standards

In the redistricting world, there are two critical standard representations for maps. The block assignment file (BAF) is a text file of comma-separated-values (CSV) that map a census block ID (the lowest level of the census shape hierarchy) to a district ID. This is the most canonical and unambiguous representation of a redistricting plan. Plans are also often distributed as the set of district shapes (with associated properties), typically in Shapefile format. This can sometimes introduce ambiguities if the shape boundaries have been overly simplified, but in general can also be a high-fidelity way of sharing a plan.

The fact that there were standard representations made it easy for us to add support for importing and exporting to and from those formats. Users could easily take a proposed plan and import it into our system in order to use our analytics or overlay the plan with additional layers like communities of interest, incumbent addresses, and such. They could also explicitly compare the plan with other plans in the system, e.g. looking at how voters how flowed from one district to another between different proposals.

Users could also use our system to do most of the work of creating the plan and then only export the final plan to the “system of record” at the last moment, sometimes because of regulatory reasons, including using more esoteric features we did not support (e.g. spitting out a “metes and bounds” description of the districts which was the old-style way of describing districts before shapefiles and BAF’s came along (“follow the line of county road 99 for two miles, etc. etc.”). There was some chortling as we heard stories of official commissions abandoning their IT (or contractor) provided systems and using DRA because it was “so easy to use”, especially for the quick and dirty process of swapping back and forth alternative plans in the final stages of settling on a plan. Then they would use the official system just at the last minute. That’s an old, repeating horror story for a enterprise software vendor seeing their solution replaced by something that wins on ease of use, even if “less functional” (although our analytics functionality is unique).

Browsers and Devices

On the browser front, from our experience we seem to be past the horrible old days of spending all your time working around differences between browsers (mostly differences between bugs in the browsers). Maybe it’s just because it’s all handled by the layers of code underneath us. But in any case we spent very little time dealing with browser differences. As is typical, the vast majority of our clients use Chrome with the remaining mostly Safari (the power of defaults). Just minimal Firefox, and virtually no IE or other browsers.

On the device front, to a first approximation we are really a desktop app. When building or analyzing a map, having a big surface to look at the map at the same time you dig into statistics really helps. Dave did a fair bit of work making our home page, command bars, details panes and state-specific pages behave reasonably on a phone or tablet. From my perspective, primarily so we didn’t look like idiots if someone followed a link to a map on Twitter or started exploring us from a phone.

Some of our users did significant map authoring on a tablet device, bless them. The device was certainly capable — it was just the significant UI design work that kept us from investing (as well as overall very little vocal demand from our users).

Social (or Not)

Sharing was a big part of our user model right from the start — both direct user-to-user sharing ala Google Docs as well as “publishing”. A published map just showed up in a big list of published maps that anyone could browse or search through — there was no other special UI when looking at these maps.

We actually did a fairly complete prototype of a “social” experience that would allow a user to “like” or comment on a published map. Ultimately we did not have the stomach for managing all the abuse monitoring, etc. that would be involved in exposing a direct way for users to communicate. Still not sure that was the right decision from the perspective of overall engagement with the application and creating a community.

We did end up providing a way to connect through Twitter — a user could provide their Twitter handle and then you could “tweet about” a map (which just composed a tweet with a link to the map), “join the conversation” which just executed a search on Twitter for the map URL or “Follow [user]” which just brought you to the map’s author’s twitter profile page. This all depended on the user providing their Twitter handle, which about 600 users have actually done (out of 36,000 or so).

Gamification

As we were planning our analytics features, from the start we were thinking about “scoring” maps. Several of the folks who began working with us had started WalkScore (later bought by Redfin) that made great hay by computing a “walkability” score for any property. There was always a fair bit of controversy about how that should be computed (what weight to give sidewalks, access to schools, stores, etc.) but that added to the overall interest and buzz to the concept.

In our case, after considering it for a while, Alec felt very strongly that it was wrong for us to provide an absolute scoring metric. Essentially, this would be “blessing” a map which we felt would be overly partisan. More importantly, any map is a (hard) trade-off among a variety of different criteria and specifying the relative importance of the various criteria did not seem like the right role for the product. We also heard this loudly from various partners and interest groups that we were working with, especially on the minority representation category where full Voting Rights Act compliance requires deep expertise.

The approach we did take was to separate out different analytic categories (Proportionality, Competitiveness, Minority Representation, Compactness, and Splitting) and devise a 0–100 rating system for each category (which themselves were combinations of individual analytic values).

Each map was rated on these properties with a radar diagram visually displaying the tradeoffs. You could also pick any map to compare yours to in order to look at how your trade-offs compared to that map. Additionally, we would also show where your map lay in a distribution of all published maps for each of those categories.

This was a great hit, but really drew interest with the addition of “Notable Maps”. We had a special page for each state that showed the current official approved maps for that state. We were inspired by 538’s Atlas of Redistricting and added a section for “Notable Maps” that would pick a published user map that scored the best for each category. In order to qualify, the map also needed to meet basic criteria of acceptableness in order to avoid clearly bogus maps from dominating the category. It proved to be a great hit as some of our enthusiasts worked to win the category with one of their maps.

Radar Diagram Show Trade-offs between Analytic Categories

It also was useful to see the kinds of explicit trade-offs necessary between the different analytic categories, especially given a state’s particular “political geography” (how Democrats and Republicans cluster in cities or rural areas of the state). We also added a section in our map analytics that would compare your maps to the notable maps in each category.

Ops

We’re a dev-ops shop, which we better be because we only have devs. Developers build a deployment image on their development machines and deploy directly to AWS. The deployment process actually involves first running a command line utility that causes the running servers to “drain” any maps in memory and refuse additional requests that would require loading a map. Once the servers have drained (30 seconds or so) the new image is deployed and the servers are restarted. We don’t have a rolling update process so the service is actually down for a minute or so while we update. The clients will retry failing requests so this is mostly invisible to users, especially when editing a map since the map edits are reflected locally and batched up.

We have very little additional process around operations. We do some monitoring when we’ve deployed some substantial change using the AWS web interface tools but otherwise have no automated monitoring or alerting. This can sometimes lead to slow response (early on, we had a couple days where the user signup flow was broken and we were unaware until feedback email came in complaining). In fact, we mostly use feedback mail as the “alarm” that something significant is going on. But in general over the last year or two things have been very stable.

Realistically, we have never undergone any of the exponential growth or extremely spiky use patterns that can lead to operational emergencies. Mostly we have seen slow and consistent growth and can fold responses into our normal development processes.

Testing

I hate testing! (Sorry Tara). Obviously I’m being a little cheeky. But I’ve built some pretty complicated apps with very limited testing support and in fact very limited test infrastructure.That was true here. If there are areas where I am having trouble stabilizing, I would much rather put effort into creating a more rational and reliable design than build a lot of tests.

The classic “agile” or “refactoring” model is that a strong testing infrastructure is critical to being able to reliably make changes in a code base. You can make changes confidently because you have this strong testing infrastructure to validate that your changes haven’t broken anything. The challenge with this is testing is hard! Sure, those testing samples always look trivial and simple. But in the real world, useful tests start to get complicated. In many cases, the problem of validation can be more complex than the original code itself. Every artifact you create, whether working code, or working tests, is an artifact that you need to maintain over time.

If you make a significant change in structure to the code, you usually have to make a significant change in structure to the tests as well.

When I did the original OT library, I actually did build a pretty elaborate test infrastructure in order to be able to load-test the library with complex interleaved streams of operations. That process ended up surfacing a bunch of gnarly issues and gave me a ton more confidence in the overall reliability of the infrastructure. Low level code like this is a good candidate for heavy test investment since it would be exponentially harder to analyze a problem at this layer through 17 layers on top across clients and service.

This kind of tight piece of complex code with very well-defined interfaces is a classic example where a good test suite is a life-saver.

Unfortunately, the majority of the actual development work is building out complex UI features where the tests tend to be both expensive to build and fragile. In Office, we struggled with massive suites of automation that were expensive to run and maintain. As we moved to a merged dev/test organization and a faster release cadence, we also started moving to local dev-written units tests (pushing tests down to the person writing the code) and then pushing out to a heavier use of telemetry — really measuring the reliability of the software in the field.

In DRA, we had some level of testing of the low level libraries (especially by Alec (using Jest) in the analytics engine where we really wanted to be able to validate that the scores we were producing were valid and matched published results). But at the UI level, we essentially just did “smoke-testing” of new features, running them through the paces on our own and then releasing them.

For some big changes, we would deploy a test server and everyone would beat on it for a few hours. That was definitely useful in shaking out the worst issues before deploying to production.

And then of course our users would find the remaining issues, especially the tricky ones that were related to asynchronous timing issues in the client. We would also frequently tune a feature’s UI with a few quick changes following user feedback on new functionality.

This is also a good example of something that definitely has phase changes as you scale the size of the team. In Office, when we had multi-year product cycles, we would suffer from “the tragedy of the commons“ where developers would “optimize“ their own productivity by checking in barely working code and the overall system would be unstable for months (or years), dragging down everyone’s productivity.

When Office moved to rapid releases, the biggest change wasn’t a massive test suite (although tests played a role), but rather the change in attitude that your checkin needed to be ready to ship. That led to all kinds of downstream design and planning work about how to stage and verify potentially destabilizing changes. That’s definitely the DRA model since anything in the main branch is basically “shipped” even if it hasn’t been deployed yet.

Support and User Base

We were concerned early on about how to provide user support in a way that wouldn’t bury us. We decided to start by just hacking in a button in the UI that pops up a feedback dialog as a placeholder. That feature just sends email to a feedback alias that we’re all on.

That ended up being (and still is) a pretty wonderful channel to understand our users, providing feature ideas and developing some interesting ongoing collaborations. It’s also an example of something that wouldn’t have scaled with a larger user base.

Early on the feedback was of the form “You lost my map! What kind of piece of shit software is this?”. “Wa wa what? Hey, it’s free!” Not a good answer if someone just spent a couple hours working on a map and had it disappear.

Over time it has improved.

Alec volunteered to take point on following up, which ended up leading to some amazing threads and relationships. There was the 16-year-old from New Jersey who sent a very thoughtful idea and argument for a feature (city splitting analytics, that we just released). “Coach”, a user in Ohio, went back and forth asking tough academic questions, using the software to do a deep analysis on Ohio maps that he ended up publishing. We’d get a request for a feature and find out the user was a special master appointed by a state Supreme Court (we implemented the feature he asked for) and he produced a great map that just got adopted.

Alec would regularly interact with academics and PhD candidates using the software for research. In one case we ended up adding an innovative analytic technique to the software based on those interactions.

A bunch of emails were from advocacy groups looking to use the software directly to influence the redistricting process in their state or train their citizen members to do so. Often we would get follow up mail thanking us with a “we never would have been able to have the impact we did without your software”. Which always felt good.

But lots of email was just, “how do you do this?” which Alec would always graciously respond with details (or usually a link to a detailed Medium article he had written describing the feature).

Next Steps

We’re just coming out of the peak redistricting period and starting to think about next steps. That little side project ended up becoming the most popular redistricting site on the web. The work and the collaborations were fun in themselves, but those results feel like a little gold star.

Anatomy of a Moderately Complex App

Written by Terry Crowley