Reproducing the results of a computational life experiment, Part 1

25 November 2025
complexity,
computational life

A while back, I listened to a podcast interview with one of the authors of the paper Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interactions. The paper is about a computational life experiment where you randomly initialize a bunch of Brainfuck programs and have them "interact" with each another repeatedly over time to produce new programs. Interactions work by concatenating two programs, running the resulting composite program as a variant of Brainfuck in which the program itself is its own output tape, and breaking apart the modified program in order to produce two offspring programs. The paper describes how over time the programs become more complex and an ecosystem of self-replicating programs emerges.

I was hooked from early on in the interview. It's reasonably counterintuitive to me that anything interesting comes out of this experiment. I might have thought that no matter how long you run the experiment, the array of ascii codes that make up the population of Brainfuck programs looks more or less like an IID random sample of ASCII characters, most of which are no-ops. Instead, it turns out that computation itself is an attractor, and there is typically a state transition where self-replicating programs arise and the population rapidly becomes more complex and ordered. They make a number of other interesting observations:

The system never settles into a static state in which the same self-replicators dominate forever. Instead, there is an ever-changing ecosystem of new replicators that arise and overwrite old ones.
Random mutations are not necessary for the system to evolve towards complexity and self-replicators. Rather, it's the randomness inherent in the initial population, the interactions, and the self-modifying aspect of the programs that leads to this behavior.

It's tempting to take this in a philosophical and grandiose direction. Does the arc of the Platonic mathematical universe bend towards greater complexity and computation? Is this a kind of natural law driving the emergence of life and the complexity that we see around us?

I don't know what kind of generalizations are valid here, and I'm open to the possibility that there is some way of looking at the experiment that makes the result seem intuitive or superficial. But in any case, I found the experiment entertaining and wanted to implement it. But before discussing my implementation, it's important to understand the experiment better. In a typical run of the experiment, we initialize $2^{17} \approx 130, 000$ programs of length 64 (a command in the language Brainfuck is a single character), where the individual characters are drawn randomly from the 128 code points of ascii. During each interaction, two programs are chosen at random and concatenated. The resulting composite program is interpreted and executed as a variant of Brainfuck in which the data array and input and output streams all refer to the program itself. In detail, there is a read head, write head, and instruction head all pointing to a location along the 128-length composite program and there are 10 instructions:

Instruction	Semantics
`<`	move read head to the left
`>`	move read head to the right
`{`	move write head to the left
`}`	move write head to the right
`-`	decrement tape at read head
`+`	increment tape at read head
`.`	replace value of tape at write head with the value at read head
`,`	replace value of tape at read head with the value at write head
`[`	if `tape[readHead] == 0`, then jump forwards to matching `]` command
`]`	if `tape[readHead] != 0`, then jump backwards to matching `[` command

From looking at the examples in the paper, I inferred that movements of the read head and write head wrap around the end of the tape, so that, for instance, moving the read head left when it is at 0 causes it to jump to 127. I assume that the instruction pointer does not work that way, so that when it executes the final instruction, the program terminates (unless the final instruction is ] and tape[readHead] != 0). Incrementing and decrementing values of the tape also take on this character where the highest value 127 increments to the lowest value, and the lowest value 0 decrements to the highest value.

How many interactions does it take to reach the state transition described above? In one graph, the authors show the state transition lasting from around epochs 2300 to 2800. They don't explicitly say what an epoch is in this context, but I am guessing it is a round of N interactions, with N being the size of the population (in this case, $2^{17} \approx 130, 000$ ). So in the case of that specific run, the transition occurred after $2^{17} \cdot 2300 \approx 300, 000, 000$ interactions. However, there is a large amount of variability. Using their complexity metric as a proxy for the state transition, they report that 12% of runs reach it within 2000 epochs, 32% of runs reach it between 2000 and 16,000 epochs, and 56% of runs fail to reach it within 16,000 epochs.

How do we know what is happening to the population of programs during a run of the experiment? The authors use two main methods. The first is that they estimate a complexity metric that they call "higher order entropy" and define as the difference between Shannon entropy and normalized Kolmogorov complexity, which amounts to

$- \sum_{i = 0}^{127} ({\hat{p}}_{i} \log_{2} {\hat{p}}_{i}) - \frac{\hat{C}}{N \cdot 64},$

where ${\hat{p}}_{i}$ is the rate of occurence of the ascii code point $i$ in the population, $\hat{C}$ is Kolmogorov complexity estimated by taking the length of the output of a state-of-the-art compression algorithm run on the population, and $N \cdot 64 = 2^{17} \cdot 64 \approx 8, 400, 000$ is the total character length of the population.

The higher order entropy metric is meant to "capture the amount of information that can only be explained by relations between different characters." In their graph of the state transition, it starts low and then shoots up as the first self-replicators take over the population.

The second method that the authors use to gain insight into what is happening during the experiment is by augmenting the underlying representation of individual Brainfuck commands with metadata that enables them to track the "source" of commands that have been copied from one location to another via operations such as , and .. Specifically, each Brainfuck command is represented as a 64-bit integer that, at the time of initialization of the command, encodes not only the command's ascii code point but also the overall position of the command within the population and the current epoch. Together, the encoded epoch and position function as a kind of unique "token" ID, which gets copied along with the ascii code point by the copy operations , and .

According to my description of the experiment so far, the encoded epoch would always be zero because commands are only initialized when they are sampled from a uniform distribution over ascii code points at the beginning of the experiment. However, the authors also did runs where there were random mutations, in which case the epoch could be nonzero. The idea that random mutations are unnecessary for the results was a fascinating aspect of the experiment for me, so I ignored them.

The authors say that tracking the number of unique token IDs in the population, which starts at $2^{17} \cdot 64 \approx 8, 400, 000$ at initialization and decreases as copy operations are executed, is a simple way to detect the state transition. When the state transition occurs, the higher order entropy rises and the number of unique tokens drops rapidly.

In a follow up post, I'm going to describe my implementation of the experiment in purely functional programming languages and some of the false starts I encountered along the way.