System Ideas That Sound Good

Terry Crowley
5 min readDec 30, 2024

--

Steven Sinofsky had a post piling on to a tweet from Martin Casado on “system ideas that sound good but almost never work”. Steven had a mostly good list but I had some problems with his list around three areas that are near and dear (to my brain if not heart), cross-platform, asynchrony and synchronization.

I’ve written about all three in multiple posts. Office experience with cross-platform covers Office specifically and I have lots of posts on asynchrony but Synchrony is a Myth is probably my favorite. I’ve written about the general problem of synchronization and replication, but I think my post Real Time Editing is Kind of Hard captured the issues Steven was concerned about best.

I was mostly motivated to write some more where I disagreed with what he was saying or thought clarification was in order, so lets take them one at a time.

Cross-Platform

Steven has been a notorious skeptic about cross-platform, but really I think his message (as far as I can decode it) comes down to “cross-platform is not easy, so make sure you are really signed up for what you are getting in to”. That is, there are no “solutions”, there are just “approaches”. That’s fine advice, but not particularly directive to teams at either end of the complexity scale. At the low complexity scale, you might very well be willing to write two completely different (simple) apps. At the higher complexity scale (certainly something like Office) you find you have to share code. The challenge at that scale is finding the “right” way to share.

Steven claimed that after Office forked the Mac code base after Office 97, “they never looked back”. That’s not correct, as I described in detail in the post linked above. We restored the shared code approach as we targeted more platforms, had higher requirements for consistency when editing complex documents in real-time against a shared service and sped up the release cycle, making “fork and port” approaches unworkable. Office took a much more rigorous (and effective) code approach to cross-platform in this effort with the biggest change being to disallow any platform-specific flags or behavior in shared code (in contrast to the previous approach which allowed liberal sprinkling of compile-time and run-time platform checks that made code changes nearly impossible to validate).

Asynchrony

The argument I’ve been making for about 40 years is that the world (and virtually every layer underneath you) is asynchronous — when you treat things as synchronous you are trying to simplify by hiding that true nature. Which works until it doesn’t. The approach I push for is to embrace that asynchrony and actually architect for it (as apps like browsers surely do). Most problems arise when you try to address the inevitable problems that arise by ignoring these issues by “sprinkling some asynchrony” into your system and not actually architecting for it in a rigorous way. Ignoring the issues that asynchrony is typically trying to address in apps is generally another word for “hanging” — which some apps consider acceptable. I can live with it more easily in the NYT Spelling Bee than in Microsoft Outlook.

Sync

In a short tweet or blog post, Steven didn’t really dive into sync except to say it “doesn’t work” so I thought I would expand a bit on it. The truth is, every app these days does some kind of “sync” in the sense of having a data model that is primarily stored in the cloud but is partially loaded locally (either in local memory or local storage) for robustness and performance. Given that basically every app does this, what about sync “doesn’t work” (or is just way harder than it looks)?

Synchronization referenced here is really referring to the approach of allowing an app on the device to make changes to the state of the local store and then have a lower-layer “sync engine” (either a layer of the application or some third-party or system service) synchronize these changes back to the cloud — pushing those local changes to the cloud and pulling down any changes made by other devices.

This has many advantages — which is what makes it so attractive. Changes to a local store are generally more robust (faster, last error prone, with more predictable latency) than transacting changes to a cloud service. It also (may) provide some level of off-line capability — making the capabilities of the application available even when connectivity is not. Another advantage is that innovation in the app — new features — often happen in the way the data store is accessed or modified. By having those operate against a local store, those new features might not require matching changes in the cloud API and service, simplifying development. The effect of those new features are automatically propagated by the sync engine, which operates at a lower level of the data model.

The most familiar example as a user is a file-syncing service. The application writing the file is completely unaware of syncing that happens automatically at the level of the whole file. The challenge here is also familiar. This works great when changes only happen and are pushed from one device or are completely serialized. But when multiple users (or one user on multiple devices) accesses the same file at the same time, all hell breaks loose. The sync engine has no way of figuring out how to merge those changes because it operates at a completely different level then the fine-grained changes to the data model made by the application. This is why applications like Office have expended significant effort to integrate with file-syncing services and be able to take over the synchronization process at a level where they can implement their document-semantics aware processing. This real-time merge is hard and getting the integration right with the sync engine is hard. So “sync is hard”.

The same type of problem can happen inside an app that makes use of a low-level sync engine. One layer of the app makes semantic-aware changes and a lower-level pushes those changes without necessarily understanding all the application-level semantics. Problems can be subtle (and hard to address) if sync engine merge happens without an understanding of those application-specific semantics. It seemed like a great idea and the decision to bet on sync impacts the whole structure of the app (so is hard to change) but problems are also hard to address. So “sync is hard”.

An approach without these problems (or benefits) is to go ahead and make those application-specific calls to the remote service and essentially treat the local store as a “best guess” of the remote data state. This approach has its own “tricky bits” but doesn’t tend to get you into such a deep hole.

--

--

Terry Crowley
Terry Crowley

Written by Terry Crowley

Programmer, Ex-Microsoft Technical Fellow, Sometime Tech Blogger, Passionate Ultimate Frisbee Player

Responses (2)