APIs Don’t Matter
APIs don’t matter?!?
That’s quite a hot take from someone who’s claimed that the Win32 (Microsoft Windows) and iOS (iPhone and iPad) application programming interfaces (APIs) represent two of the most valuable technical moats in business history. But hear me out.
I will warn that I am going to brutally simplify here. I will try to share a mental model to cut through the confusion that typically surrounds these discussions.
The virtuous cycle for an OS platform is straight-forward to describe if hard to execute. A device with unique capabilities attracts users, users and the device capabilities attracts app developers, apps attract more users. The “virtuous cycle” is reinforcing. More users attract more apps. More apps make the device more useful and attract more users. This is a powerful positive feedback loop that can generate immense returns.
The API plays a key role in locking this advantage in. An app needs to build on the APIs that expose the device’s unique capabilities. The work required to build on the API makes it challenging for an app developer to move to a new platform. Even if a device with similar capabilities comes to the market, that device is not as attractive to users because of the missing apps. The device is not as attractive to app developers because of the missing users and the work involved to change APIs and support some new device.
Bootstrapping this cycle can be challenging. Every platform has some kind of origin story where a “killer app” helped get that virtuous flywheel turning. Spreadsheets and early PCs. Windows and Office. Macintosh and desktop publishing. iPhone and the first real web browser on a phone.
Now the generic API layer diagram looks like this:
Simple. Apps are programmed to an API that exposes the device capabilities.
A more accurate diagram might be this one:
The first key change is at the bottom — reinforcing that the whole reason this discussion is interesting at all is because the device is exposing some unique capability or combination of capabilities. That might be a large graphical interface with attached keyboard and pointing device (that also runs my old MS-DOS applications). Or a full screen portable smartphone with touch interface and a battery that lasts all day. Or high-speed parallel processing for training and running AI models.
The unique device is paired with unique APIs that expose the device’s unique capabilities. The APIs themselves are not the key thing here — the unique device capabilities are. These typically require some kind of unique API that is tightly bound to the device capabilities. The iOS graphics API was highly tuned to support the responsive stick-to-the-finger animation capabilities of the hardware’s graphic subsystem. iOS video and audio APIs were carefully designed to leverage battery-saving custom AV decoding and encoding hardware rather than using battery-draining processing on the main CPU. The Windows DirectX API exposed graphics capabilities in the GPU that allowed apps to deliver high-frame-rate, high-resolution 3D game play.
These unique APIs are tightly bound to the unique device capabilities. Applications built to these APIs have a large percentage of their intellectual property intertwined with the custom API. The diagram above tries to show that by overlapping the API layer and the app layer. I might logically have cleanly isolated layers, but in practice the intellectual design and programming effort inside the app are tightly designed for and bound to the API.
Over time this picture changes:
There are competitors in the device space. The device capabilities aren’t quite so unique. Apps still make use of unique APIs, but there are also common libraries of APIs that span device types (e.g. OpenGL in graphics). These common libraries can arise because the device capabilities (at least in the areas covered by the library) aren’t evolving much — they do what they need to do and are relatively static. The common libraries might be created to specifically address building across platforms or they might have evolved as a different point of view on exposing the underlying device capabilities. The common libraries might be provided by third parties or might be part of an “open” strategy by the OS maker that includes support for external standards. The tight integration between new capabilities and new APIs isn’t as important any more because the capabilities are relatively static. There is even time for standards to arise.
The apps have evolved as well. There are more of them. Successful apps have seen a lot more of their intellectual effort and value invested internally, in application layers above the device API. Some apps have continued to invest deeply in unique device APIs but others invest less (certainly in proportion to the overall app value proposition) and might depend more on common layers than on unique APIs. In some cases they might be built completely on common APIs.
That API moat is degrading. But it is not degrading because of a bad API strategy. It is degrading because the device capabilities are static and are losing their unique value proposition. The APIs don’t matter. The unique capabilities that are valued by users and app developers matter.
There is often a huge amount of argument, thunder and lightning around these third party API layers. There is choice and fierce arguments about the advantages of one or the other. The arguments are often tied to language choices (Java, C#, JavaScript, Go, Rust, C++, etc.). People make career choices and become bound with all of the commitment of cult. All this argument is happening because it doesn’t matter! You can pick one or the other of these layers and still build to whatever common API you want. Where necessary you can tunnel through to the unique APIs you need. They aren’t protecting any unique capability or asset.
The fierce framework battles that sweep the JavaScript ecosystem are because it doesn’t matter. None of these frameworks are protecting a unique asset. Go ahead, switch.
OS companies often hate these third party libraries, especially if they get thick, which they have a tendency to do (see Steve Jobs’ classic Thoughts on Flash open letter). If it sits on the side and is a simple utility library, it’s pretty innocuous and even helpful. When it becomes an all-encompassing layer (as Flash did), it adds overhead, limits apps to a subset of common cross-platform capabilities and halts or slows the ability of the OS maker to propagate new unique capabilities through to app developers. It can kill or slow that virtuous cycle. More than anything, the OS objection is a scream “I’m not done yet! I have more innovation to deliver!”
I was motivated to write this after reading Steven Sinofsky’s latest post about the thinking that went into building the new Windows 8 API. The post includes a long and painful description of Windows API efforts from the 90s forward. I had fascinating conversations and email threads with Steven on these issues when I worked for him as head of Office development. I later worked closely with Aleš Holeček as a first and best customer for the new API work that he led when he worked for Steven on Windows 8.
Through the late 90’s and 2000’s, Windows API efforts became divorced from actually exposing new unique platform capabilities. Office didn’t care what Windows was doing. The whole .Net ecosystem was a layer built on top of Windows. Even capabilities that Windows desperately needed like a safe easy mechanism for app delivery were delivered as an independent layer (see .Net ClickOnce) rather than deeply built into the platform. It was the later set of capabilities in Apple’s iOS platform and App Store that revealed the barrenness of this particular layering strategy divorced from the platform.
The core PC value proposition of a computer with a large graphical screen, keyboard, pointing device and network connectivity was in place by the mid-nineties. Despite orders of magnitude improvements in bandwidth, memory, processing power, screen size and resolution, the core value proposition remains the same to this day.
I can say this 25 years on, working at an iMac with 10 cores, 64GB of memory, 4 TB of storage, gigabit networking, 5K 27" Retina display, wireless keyboard and mouse, high resolution camera and microphone, but using a set of applications that would not have been out of place on a PC of the 90s.
In some ways, this is tremendously frustrating for the people working to deliver on these order-of-magnitude changes. There is incredible technical innovation that needed to happen to scale up every part of the operating system, from the lowest levels of memory management, storage and process management. Other innovations delivered automatic updating of OS and apps, seamless syncing across the cloud, amazing graphics, video. And yet the “jobs to be done” stays remarkably constant. This was true even with radical changes in form factor as laptops overtook desktops and got thinner and thinner and more portable.
The rather bloodless business school description of “sustaining” vs. “disruptive” innovation doesn’t really capture the dynamics of living through the changes here. It was not obvious that these changes couldn’t lead to disruptive change, especially in the context of continually rising PC sales numbers (peaking in 2011). But what should have been obvious is that API layers divorced from deeper operating system innovations tied to those evolving device capabilities (and broader technical landscape) wouldn’t work as a way of delivering sustainable competitive value. APIs play a role but the unique capability they expose is what matters.
The Windows 8 effort made a bet that the set of deep innovations needed to deliver a tablet-like device with great touch performance, great battery life and a safe application ecosystem, and with strong ties to the existing Win32 desktop environment would be a unique and compelling value proposition. It would attract both users and application developers to a set of unique new APIs and device capabilities and could bootstrap a healthy API strategy. That failed in the marketplace, but it wasn’t a failure of execution. It failed because the users and app developers did not view this combination as a compelling differentiated set of capabilities (or not enough users and developers anyway). The Surface Pro, the prototypical target device, is essentially a great light-weight laptop. It fills the same “jobs to be done” as a laptop.
There is a fair question to ask as to whether this was “looking for your keys under the streetlight instead of across the street where you dropped them” behavior. We had Win32. We didn’t have the phone. We couldn’t take the strategy of moving up from the phone. But to be fair to the Windows team, the truth is that despite the remarkable technology that is represented in a modern tablet, ten years on it still sits in this awkward middle ground between PC and phone. I love my tablet, but readily grab my phone for the same “jobs to be done” (reading, messaging, email, video) when tablet is unavailable. And obviously, given relative sales numbers, billions of other people just get by with the phone all the time. In the other direction, Apple continues to push new iPad windowing capabilities and “magic keyboards” to use the tablet for the same jobs as a laptop.
Away from the OS
I’ve focused on devices and OS here but the discussion of APIs and layers applies to any place where unique capabilities are exposed. In the heyday of Office programmability, there were millions of developers whose job was building line-of-business applications on top of the Office apps. Office provided the experience framework for these applications as well as exposing a set of underlying capabilities (especially to embed content in a rich document that could be shared, viewed, emailed, annotated, filed, printed, analyzed) that offered a compelling and unique value proposition for an app developer. Some of these scenarios were chipped away by the web, while some (especially with Excel) are still incredibly relevant.
Office 365 has a unique set of “capabilities” — much of the customer’s content in documents, email, as well as the complex hierarchy of user and group identity. For line-of-business developers tying into this content and infrastructure, this is a (sustainable) unique set of capabilities. Again, the APIs don’t matter. It is the capability the APIs expose that matters.
Nvidia’s CUDA API offers an interesting recent example of this dynamic. CUDA is the API Nvidia provides to build AI models on top of their GPUs and new hardware customized for running AI models. Nvidia is clearly running a familiar playbook, working to build an API that exposes the unique device capabilities and helps bind developers to its ecosystem.
Meta’s recently released AITemplate tool is also a classic response, creating a layer to allow developers to transparently target either Nvidia or AMD hardware. Whether this succeeds (or plays any significant role) depends a lot on how future innovation across the underlying hardware and AI algorithms and tools happen. Do hardware and tightly integrated software layers continue to evolve fast enough that any attempt at a common layer ends up looking like the various limited-impact Win32 porting layers (WINE, etc.)? Or does it stay static enough that developers targeting a shared layer (which perhaps offers some unique value-add of its own) end up significantly eroding Nvidia’s moat?
I don’t know what will happen, but using the model above I would expect that if we don’t see new hardware innovations enabling new AI approaches but rather see more predictable iteration in hardware (incredibly challenging, but predictable) and a focus on thickening or churning an API layer, we will see a degrading moat. That said, the Windows experience demonstrates that even a degrading moat can continue to generate great returns for a long time.
Footnote: Was Windows 8 a failure?
I might take some heat for calling Windows 8 a technical success above, but I think the general perception of the project is wrong. The Windows 8 project involved moving to a new chip architecture, building custom hardware, designing a new modern API and a new touch friendly UI design. It was a massive effort across the entire stack from chip up and delivered what it was targeting, on time. The reception and pushback was painful and generally related to a small set of bad decisions that were easily reversible (and were quickly reversed in the 8.1 release). This included not allowing the 100’s of millions of existing desktop (non-touch) users to just boot to the desktop rather than having to navigate through the live tile start screen. New “Metro-style” apps were full-screen (as all tablet apps were). This made them unfriendly participants in a desktop world. This was especially hair-pulling because the default browser and PDF viewers shipped as new Metro apps so a generally desktop user would be constantly flipping back and forth as they clicked on links or attachments in their desktop apps. At a deeper technical level, policy decisions to limit access to the new API layer (and affordances and capabilities it exposed) from existing desktop apps tended to irritate developers as well as users. All of these decisions could have been reversed and eventually were.
From a result perspective, Windows 8 did not end up booting a new device and application ecosystem but that is not usually the criteria used when talking about its success or failure. Additionally, I have not seen any plausible “alternative history” or counter-factual that describes a path that would have booted an ecosystem. My beefs with Windows have generally been when it “tries too hard”. Just accept that the basic use case has reached sufficiency and focus improvements on channeling the continued underlying hardware improvement trajectory, response to the larger ecosystem environment (cloud) and the basics of performance, security and reliability (which is much closer to what MacOS has been doing).