# Surprising Automata

I took the opportunity of a recent mild case of Covid with attendant quarantine episode to finally work my way through Stephen Wolfram’s 1200 page opus A New Kind of Science. Only 20 years after it was published!

This book probably needs a year-long graduate course to really absorb or critique it — that’s not going to happen. But I wanted to at least capture a few things that I took away from it and integrated into my “world view”.

It is also worth noting that “batshit crazy” is one of the milder adjectives that has been applied to Wolfram. “Humble” is definitely not included in that set. In fact he is quite explicit in the reading notes that he is intentionally not using the standard scientific tone of “false humility” in his writing. That said, a few less “this is the most amazing idea that anyone has ever come up with” (not excluding Einstein, Newton, Darwin, or Turing) might have made for easier reading.

Second and somewhat related, he takes forever to put his ideas into historical context (100’s of pages into a book about computability before the term “Turing Machine” comes up). He claims in the reading notes that this is to make for a more consistent presentation not weighed down by irrelevant historical baggage. For someone with a background in the area, you spend 100’s of pages wondering why he’s not even mentioning highly related context.

I don’t want to write a review or summary but I’ll need to provide some context since I am guessing the overlap between our readership is probably low.

Perhaps Wolfram’s most profound point (and the whole point of the title) is that he wants to move from a perspective of “the unreasonable effectiveness of mathematics” in describing the universe to “the unreasonable effectiveness of computing” to describe it. Specifically, he would argue that computation is both a *better* model for what is happening in the universe and in fact is the* actual *model for what’s happening, both at higher levels (e.g. mental, biological) as well as at the sub-quantum level (there were some very recent announcements of claimed progress in this area). In particular, computing is a much better basis for understanding how complexity is generated. It’s also a better model for understanding how discrete systems (basically everything) actually operate at the lowest levels.

For such grand claims, the book starts with a slow buildup, but one that quickly reveals some intriguing ideas. I am assuming that most readers are familiar with Conway’s Game of Life, and the concept of two-dimensional cellular automata. Basically, you have a (potentially endless) two-dimensional grid of cells where each cell can be black or white. You then have a small fixed set of rules for how each cell evolves (changes color or remains the same) in the next step of the automata based on the state of its neighbors. The specific Game of Life rules seem trivial but end up generating interesting complex evolving patterns. Over time it was proved that those rules can actually implement a universal Turing machine (UTM). That is, a grid (with carefully prepared initial state) operating with the simple rules of the Game of Life can compute any function that is computable by *any* machine. I find that amazing (although I also found the original idea of a UTM amazing).

Wolfram wanted to explore the behavior of “simple programs”. For those of us first exposed to computing with a simple program like “main() { printf(“hello world”) }”, the idea that you could automatically generate a set of simple programs and that they could do *anything* interesting is hard to get your head around at first.

Wolfram starts with “one dimensional cellular automata”. Rather than a two dimensional grid of cells like in the Game of Life, you simply have a one dimensional row of cells, with each cell either white or black. You can visualize a programs behavior over time by placing each new row under the previous row to produce a simple 2D visual representation of the evolving state. He starts by limiting the rule set to one that looks at the state of the current cell and the state of the cell to its immediate left and right in order to determine how to generate the new state of that current cell. Since you are looking at 3 cells to determine the new state of the middle cell, which can take on one of two states (black or white), you need to specify 8 rules (e.g. [White, White, White] => White, [White, White, Black] => Black, etc.). A simple way of encoding this kind of rule set is simply with 8 bits. So bit zero specifies the output of the [White, White, White] rule, bit five specifies the output of the [Black, White, Black] rule, etc.

With 8 bits, that means you have 256 (2 to the eighth power) possible programs. So given this definition of a “simple program”, he can actually exhaustively look at the behavior of all 256 of them. As a programmer, I found the experimental setup itself an interesting framing. These programs can’t “crash” or be syntactically invalid. They just “are”. They just compute. And you can then explore their behavior, handing them various inputs.

(As an aside, it is a small leap of analogy given this framing to look at some soup of reacting molecules and be able to describe what is occurring as literally a “computation”.)

Some of the programs are boring; for example rule 0 takes whatever input it is handed and just turns it white. Rule 255 turns all input black. Rule 00110111 (236) is the identity program (it just preserves its input). Rule 11001000 (19) is the inversion program. But other rules do more interesting things.

He ends up dividing these programs into four sets. Set 1 is the boring set that just produces a static result (0 and 255). Set 2 produces a result that is either fixed or repeats on a regular interval (e.g. 19 and 236 mentioned above, or 168 which shifts its input to the left). Set 3 is more interesting — these produce output that appears random although with potentially interesting artifacts that evolve over time. In fact, Wolfram’s well-known Mathematica program actually uses one of these automata to generate its “pseudo-random” numbers.

Wolfram finds this set of automata remarkable and its worth spending a little time reflecting on this sense of wonder. You have this simple program, specified with just 8 bits, that operates on some fixed input, e.g. starting with a single black cell, and it then proceeds to generate completely random (unpredictable and effectively unanalyzable) output till the end of time. Yes, the output is “deterministic” — but there is no shorter way to determine it then to simple run the machine. Where does all this randomness (complexity?) come from? It doesn’t come from the initial conditions (so no chaotic “sensitivity to initial conditions”), it doesn’t come from the environment, it just “comes” by supplying time and energy and running this trivial computation.

It is worth reflecting on and sharing that sense of wonder since it ends up as the basis for his most profound claims.

Set 4 is another interesting class and in fact there are only four in this class (or just 1 — rule 110 — if you consider that the others are symmetric transformations that can be generated by taking rule 110 and flipping left and right or black and white). This automata generates output that isn’t quite random but seems to generate interesting intermediate artifacts that then interact in interesting ways. Eventually, it was proven that rule 110 is a universal Turing machine. WTF? This simple program, specified with 8 bits, just mindlessly replacing black and white cells according to a trivial set of rules can implement any computation that any computer anywhere will ever be able to compute?

This progression blew me away, just as my original course on automata theory 45 years ago also blew me away. In that course you progress through these simple definitions of automata that can handle larger and larger classes of computations and then get introduced to Turing machines that looks no different in basic form from these others and you are suddenly told “that’s it — this can do anything”. The progression here is even more startling. The basic form of the computation seems utterly trivial and yet can compute anything (anything computable that is).

Wolfram continues exploring many different (including larger) classes of simple programs, seeing similar results. These lead him to state some larger conclusions. His “principle of computational equivalence” initially seems like a restatement of the Universal Turing Machine result — that all these simple systems can compute anything computable. That’s essentially the “high bound” of the result. The “low bound” is perhaps more profound (and speculative). It says that if a system isn’t “obviously simple” then in fact it is almost certainly equivalent to a UTM and therefore is equivalent (in computational capability) to any other system that exists (or can exist).

This basically says, “if you see any complexity at all, the underlying process is a computation that could in principle compute anything”.

A secondary consequence is that if you see complexity, one need not (should not) assume that you have a complex set of rules in play — simple rules can generate complexity that is undifferentiatable from randomness. He gives as a real-world example the complex surface of a shell that looks for all the world like the output of rule 60 (from set 3). In fact, molecular biologists and evolutionary developmental biologists have uncovered underlying mechanisms behind complex fur patterns (and other structures) that are determined by relatively straightforward processes of interacting molecular gradients playing out in waves across the surface of an animal’s skin, easily describable as a simple computation.

The whole animal body plan is controlled by a small set of “Hox” genes that lay down a timed sequence of molecular gradients that then control the development and differentiation of all complex animals. It is no **analogy** to call this a computation. It **is** a computation. The underlying mechanism is conserved across a billion years of animal evolution.

The other major claim is related and concerns “computational irreducibility”. The claim is that if a process isn’t “obviously simple” then it is likely generated by a process of equivalent computational complexity to anything you could use to analyze it. It is therefore “computationally irreducible”. Essentially, you need to run the computation to figure out the result. No shortcuts. This effectively flies in the face of mathematical analytic methods that are all about coming up with some shortcut — some mathematical representation that can be quickly computed to predict the outcome.

His argument for why people believe in “the unreasonable effectiveness of mathematics” is essentially “selection bias”. Physics has focused on the easy parts — when things get complex (think turbulent fluid flow) their techniques are far less effective. One could definitely see that it takes a bit of hubris to accuse all of mathematics and physics of cherry-picking. And perhaps understand why the physics and mathematics communities haven’t greeted him with open arms.

There’s lots more in the book about physics, mathematics, analytics and more, but definitely not my area of expertise.

I have long had a bias towards seeing descriptions of biomolecular processes or activity in the brain and viewing them pretty clearly as “computations”. Wolfram’s opus adds significant texture to that perspective. One is the notion that you can have quite simple mechanisms that clearly operate as an executing program. You don’t get into issues of “syntactic complexity” (and syntactic breakage) or “irreducible complexity” that immediately comes to mind when thinking about a computation as something like 10 millions lines of code in Microsoft Word. The computation just “happens”.

Now given such simple programs, it follows that not only can Wolfram do an effective search for “interesting” behavior, but evolution itself can do that search. And a very simple program with a very simple mechanism then can end up generating interesting phenomena. Wolfram’s insight is that the “density” of interesting behavior is high. So even as the underlying mechanism gets more interesting than the 256 one dimensional cellular automata, and the number of possible programs gets into the thousands, millions or billions, you can still easily encounter them, even using a probabilistic rather than exhaustive search. In fact, a probabilistic search (like natural selection) is the only search strategy available when you have a large enough search space that behaves this way.

Just to quickly summarize these points.

- Computation can generate complexity.
*Simple*computations not only can, but are likely to generate complexity.- If you see complexity (in physics, chemistry, geology, biology, sociology), it is almost certainly the result of a computation.
- It is also likely the result of a computation operating according to relatively simple rules. You don’t need complex rules to get a complex result.
- Modeling that complexity is “computationally irreducible”. You might be able to say something simple (e.g. that random generator will produce 50% black cells, or that complex interacting balloon full of gas molecules will exert pressure relative to volume, temperature and number of particles). But except in those sets of cases where the complexity “washes out” (like the Ideal Gas Law), you really need to model the computation.
- Modeling may be irreducible, but it is at least feasible since the model is probably simple.

For all the abuse Wolfram takes for his grandiose claims, I am mostly aligned with him on the core claims that in many real-world cases, computation can not only model complexity, it is actually the underlying basis of that complexity and a better way of thinking about the underlying processes than any possible closed form mathematical model.