The Inflationary Theory of Software Systems
I love making very stretched analogies between fascinating scientific ideas and software development. Here goes another one.
The theory of cosmic inflation makes a set of precise predictions about the characteristics of variations in the cosmic microwave background radiation. Essentially, random quantum-level energy fluctuations that were present when the universe was very small were imprinted on the distribution of matter and energy as the universe expanded by many orders of magnitude during the inflationary period before the Big Bang and are now detectable as a specific pattern of variation in the cosmic background radiation. So our understanding of the very small at the quantum level informs our ability to interpret what we see at the largest cosmic scales. Its an amazing combination of the theories dealing with the quantum and the cosmological levels.
Biology also provides a fascinating example of this process of imprinting a pattern during inflationary expansion. The eukaryotic cell that is the basis of all complex life (animals, plants, fungi and protozoa) has a large set of complex structures (nucleus, Golgi apparatus, endoplasmic reticulum, etc.) and complex processes (DNA replication, protein synthesis, etc.) that are shared in essentially identical form across all these diverse forms of life — from redwoods to yeast and whales to protozoa. These structures are absent from the bacteria and archaea cells that combined to form the first eukaryotic cell. Essentially all these complex structures and processes needed to evolve together in a single population before they could explode into the wide range of species we see today — all still sharing an amazing amount of core cellular structure and processes that they inherited from that initial interbreeding population that first evolved this complex cellular toolkit.
Many successful software systems also go through an “inflationary period”. They start out with a small group of developers but then at some point in their success curve, the team expands dramatically. There is a lot more code written and all the new code gets imprinted by the characteristics of that relatively small core. Exactly when that inflationary period occurs in the history of the project can have long-term consequences for the structure of that much larger later system.
When that inflationary period occurs has relatively little to do with whether the software design has had time to fully evolve, be validated and “bake”. It has much more to do with when product success (or product investment hoping for success) occurs in the product history. It also has a lot to do with whether the system has had time to grow and evolve organically under the design of a few engineers or accretes through large new acquisitions of code.
Microsoft Windows and Apple’s OS offer an interesting comparison. The Win32 NT core had a relatively small well-designed core but as Windows grew and expanded, the overall API was formed by jamming together that core with APIs contributed by other big teams including many APIs from the consumer-oriented Win95-based OS as well as service, data and graphics APIs from other parts of the company. Consistency suffered, even with things as basic as the handling of strings throughout the API.
Re-establishing consistency essentially becomes impossible because of the huge investment made by all the developers outside the company programming to that API. Even large decade-long efforts to re-establish consistency failed (like the WinFX/C# investments that bet that managed code environments would be an inflection point or the WinRT effort that bet that tablet computing would offer the inflection point necessary to bring developers forward). Win32 still plods along.
At Apple, the first Macintosh OS was ultimately completely replaced with the return of Jobs and the acquisition of NeXT Computer. That new MacOS started as NextStep for the Next computer which itself built on top of several generations of Unix evolution. NextStep was able to evolve within a relatively small but demanding ecosystem. Ultimately only about 50,000 NeXT computers were sold during its decade-long lifetime prior to being acquired by Apple. The computer and integrated software system was specifically targeted for the ease and elegance of its development environment so there was strong incentive to ensure a clean overall design.
The inflationary period only started after this extended development period with MacOS and then iOS for the iPhone and iPad. The result was a much more consistent API surface across the OS and across these devices than Microsoft was able to achieve with Windows.
Applications also go through these inflationary periods. I remember talking to a Microsoft Word developer about his efforts to clean up the internal selection model in the Word code base. I had just finished a redesign and reimplementation in the FrontPage code base so I was fascinated by the comparison of the two efforts. FrontPage was a much smaller code base and I had completed the work by myself in about 3 weeks. He had worked on the Word design for several months and had identified 13,000 places in the code base that he would need to modify to directly deal with the new selection model, as well as other changes necessary to plumb it through the system to actually realize the benefits of the new design. Ultimately, it was just too much (destabilizing) work and he gave it up.
Word also has layers of the code base and core code structures that deal with “paging in” the document rather than reading the entire document into memory at once. This was a critical capability that allowed Word to quickly open and edit long documents that were bigger than the memory available on the machine. It was a key early competitive differentiator. Of course now it’s completely unnecessary on machines with 1000’s of times more memory and just adds a layer of complexity imprinted on the code base.
It would be interesting to see what characteristics got imprinted on the Instagram or WhatsApp code bases when their tiny teams were acquired by the much bigger Facebook organization.
Saying “just maintain better design discipline” ignores that contingency with the benefit of additional time played a significant part in examples where an inflationary period resulted in better outcomes. Trying to “keep it small” usually results in a plethora of independent small components that are hard to make consistent and evolve consistently. Cue micro-service architecture horror stories, as just one recent example.
Even Unix, with its tool philosophy of “do one thing well”, also provides some interesting tales. I was in the Bell Labs Unix group in 1978 and they were already having to go back and re-work the command line option handling code in all those tools because of the inconsistencies that had grown up as the OS effort expanded from that small initial research group to a much larger development organization.
Perhaps the best guidance if you find yourself in such an environment is that the faster the expansion, the more likely that you’ll require aggressive efforts to try to tame the worst issues. On its face this challenge looks like the normal refactoring and code re-architecture efforts that any evolving system requires but the inflationary growth adds a level of challenge and difficulty that surprises even seasoned engineering managers. The inflationary growth means that business pressures combine with architecture and design issues and make for challenging tradeoffs. Good luck — if you’re in this situation it usually means you’re in for an exciting ride!