Yuck, Office HTML
Steven Sinofsky’s recent blog post on the strategic background behind Office’s HTML support in Office9 (branded as “Office 2000” and released in 1999) brought back a lot of memories. For that release, I was leading FrontPage development (our web authoring tool) and personally responsible for the HTML editing surface and how to support all the new HTML that the Office apps were putting on the clipboard and copying and pasting into a web page in the FrontPage editor.
Steven’s post discusses the strategic thinking at the time and where he and Bill Gates disagreed and where they got some things wrong, looking back with 20–20 hindsight. It’s fascinating to look at the decisions then and try to “debug” where the analysis went wrong. To what extent was it not being able to see the future and to what extent was it not being able to understand the present?
I started writing a draft about this and quickly got into the weeds. Who wouldn’t want to read about the challenges of how to encode an empty paragraph in HTML? And anyone who has been frustrated by paragraph spacing appearing and disappearing when copying and pasting content into an email would probably be fascinated by the complex way that margin semantics differs between different document formats. (That has literally been a problem for 30 years.) Or maybe it’s not so interesting.
So stepping back a level, I was struck by one paragraph from an overall excellent post.
When the topic of HTML as a file format came up in strategic conversations, especially with BillG, the discussion quickly turned to a view that HTML implied ceding strategic control of file formats to either a competitor or to what might become a standards body — that was the worst of all outcomes.
Before diving in on this, it’s worth setting up a framework for how to think about it. Any application that enables a user to create artifacts can be broken down into three areas of concern. An app defines a data model for the artifacts that it generates. It implements a runtime for executing this data model. And it exposes an editing experience for creating these artifacts. For most applications, these three are all tightly intertwined — in Office the vast majority of editing commands actually set properties on the data model and many characteristics of the data model are specifically designed to expose a specific editing experience. For example style sheets in Word make it possible to define and manipulate a consistent overall appearance for a document. They are encoded in the data model, interpreted by the layout engine and have extensive UI for manipulating them. Word’s track-changes feature is designed to support more complex multi-user editing experiences and work-flows. It also has extensive support in the data model, is processed by the runtime layout engine to either hide or expose changes and has an interface designed to support these complex multi-user workflows and user interactions.
As an application evolves, most features involve parallel and related changes in the data model, runtime and editing experience in order to deliver a complete new experience.
In this context, Bill’s concern about ceding strategic control of the file formats makes complete sense. Freezing or slowing down the ability to make changes in the data model of an application gets to the heart and soul of how an application evolves and how new innovative features are developed. Further, a frozen data model is a static target for competitors. In the layered market of the PC ecosystem, Bill was always concerned with where you could commoditize suppliers or competitors around a standardized interface while leaving yourself free to innovate inside the layer. And likewise avoid commoditization in your own efforts.
At the same time, this concern reflected a fundamental confusion about what HTML was and how it would evolve.
HTML was initially designed (circa 1990) during a period when there was continuing back-and-forth about how to design document formats. Should a format describe presentation — how the content should be formatted and displayed? Or should a format describe semantics — what the content means? HTML initially landed firmly in the semantics camp. An HTML document had “headings” and “paragraphs” with structure for ordered and unordered lists. Text was annotated as “strong” (typically shown in a bold font) or with “emphasis” (shown with italics). Even tables were initially added to help describes rows of data (in the scientific documents that were the original target for the format). Tables would quickly become ground-zero for the bastardization of HTML and the conversion to effectively a presentation-only format.
For practical purposes now, HTML is essentially just part of the browser API layer. Applications create and destroy HTML fragments in order to present content on the screen with essentially no concern that the resulting HTML actually encode semantics. Accessibility concerns do drive some constraints here, but not in a way that actually pushes back towards a real semantic document model.
Even in the context of a purely semantic model, HTML in the Office applications was never plausible as anything but an encoding layer — essentially a more complicated ASCII or UTF8 text format. HTML contains virtually none of the complex data model semantics that are found in the Office applications. Even the product that most overlapped with HTML from a cursory perspective, Microsoft Word, differed in 1000s of both subtle and profound ways (as well as just being an order of magnitude more complex overall).
When the Office teams started to look at encoding Office data model semantics in HTML, they immediately were forced to heavily leverage “escapes” like encoding semantics in HTML comments or opaque XML blobs that would be ignored by the browser but could be interpreted when opening the file in the Office applications to recreate the original data model.
Note that the problem being addressed here was completely different than what an application like Google Docs is doing when it uses HTML as part of the in-browser application experience. In that context, HTML is just part of the browser API. The Office apps were trying to construct an HTML “package” that could be interpreted by the browser as a self-contained blob — not as the HTML front end to a complex service-backed application experience. “Dynamic HTML” had just been released as part of IE 4 and we were still half a decade from being able to build those kinds of dynamic service-backed experiences that could rival the functionality of the Office apps. They were trying to construct a blob that could both be interpreted by the browser as well as opened back up in the original application with little or no loss of functionality.
My reading as to why Bill was confused here was that he really wanted HTML (and specifically IE) to be this uber-powerful shared component layer that the Office apps would build on. This “universal canvas” was his (misguided) holy grail. In this context, he wanted HTML to embed all the combined semantics of the Office apps, he just wanted it to be delivered in a Windows and IE-specific (proprietary) way. This is and was nonsensical and we knew it at the time.
Any engineer who has spent years on a complex application knows that you always have a layering challenge. If you pick too low a layer to build on, you end up having to recreate a lot of pre-existing functionality that is not fundamental to your product mission. But if you pick too high a layer, you end up constantly struggling to get it to behave or evolve the way you want; it’s like trying to paint a portrait with a 4 foot paintbrush. 99 out of 100 engineers would prefer to err on the side of picking too low a layer. Of course, ideally you want the system teams to be designing appropriate layers to pick from and not forcing a bad choice!
The issues here would continue to play out for decades. The catastrophic decision to abandon IE and HTML development to invest in Windows Presentation Foundation in the following years was another consequence of this misguided view. The Office teams would continue to fight with Windows on layering issues for decades.
On the file format front, the experience with HTML helped inform lots of work on the new OpenXML file formats released in Office 2007. The experience also clarified why we struggled so hard to prevent the European Union from forcing us to use the Open Document Format file formats as our native formats — essentially ODF (as the native, default format) was part of a regulatory effort to effectively freeze future development in Office.
I also had my own share of strategic blunders through this period. We still had a vision that FrontPage could provide this composite editing surface where all this Office content could be copied and pasted in to the editing surface to build up rich HTML-native web pages. As the HTML Office generated got more and more convoluted, we continued going down a path of making sure you didn’t “lose” anything — from either a visual presentation or semantic perspective — or at least a cleanly degraded semantic. This was essentially an impossible problem (the Medium engineering team had a pretty good post on this a while back). It also led us to generate similarly baroque HTML as the rest of Office, just as our customers were moving to want cleaner and cleaner HTML (so they could integrate it with other tools).
We later built special features to clean up the Office HTML, but the feature legacy and brand damage were significant. As always, picking the right features to build is usually the hardest challenge even if we spend most of our time in the process of actually building them.