Software is like clay on the wheel

Software is like clay on the wheel

5 minutes, 19 seconds Read





A few weeks ago, Simon Willison started a coding agent, went to decorate a Christmas tree with his family, watched a movie, and came back to a working HTML5 parser.

It sounds like a party trick. But it worked because the results were easy to verify. The unit tests either pass or fail. The type checker either accepts the code or not. In such an environment, work can continue without much supervision.

Geoffrey Huntleys Ralph Wiggum loop is probably the cleanest expression of this idea I’ve ever seen, and it’s quickly becoming more popular. In his demonstration videohe describes creating specifications through a conversation with an AI agent, and running the loop. Each iteration starts again: the agent reads the specification, chooses the most important remaining task, implements it and runs the tests. If they succeed, it connects to Git and exits. The next iteration starts with an empty context, reads the current state from the disk, and picks up where the previous run left off.

If you think about it, this is what human prompting already looks like: ask, wait, assess, ask again. You shape the code or text like a potter shapes clay: push a little, turn the wheel, look, push again. The Ralph loop only automates the turning, making many more ambitious tasks practical.

The main difference is the way the state is dealt with. When you work by hand this way, the whole conversation comes along. In the Ralph loop, each iteration starts clean.

Why? Because carrying everything with you at all times is a great way to not go anywhere. If you start working on a problem for hundreds of iterations, things start to pile up. As tokens accumulate, the signal can be lost in noise. Flushing the context between iterations and saving the state to files allows each run to start clean.

Simon Willison’s port of an HTML5 library from Python to JavaScript showed the principle on a larger scale. Using GPT-5.2 via Codex CLI with the --yolo flag for uninterrupted execution, he gave a handful of clues and let them run while he decorated a Christmas tree and watched a movie with his family.

Four and a half hours later, the agent had produced a working HTML5 parser. It has passed more than 9,200 official tests html5lib-tests-suite.

HTML5 parsing is notoriously complex, but the specification defines exactly how even malformed markups should be handled, with thousands of edge cases that have accumulated over the years. The tests gave the AI ​​agent a constant foundation: each test run brought it back to reality before errors could worsen.

As Simon put it, “If you can reduce a problem to a robust test suite, you can throw a coding agent at it with a high degree of confidence that it will ultimately succeed.” Ralph loops and Willison’s approach differ in structure, but both rely on testing as a source of truth.

Cursor’s exploration of scaling means confirms that this is starting to work on an enterprise scale. Their team investigated what happens when hundreds of agents work simultaneously on one codebase for weeks. In one experiment, they built a web browser from scratch. More than a million lines of code in a thousand files, generated in a week. And the browser worked.

That doesn’t mean it’s safe or fast or something you would send. It just means it met the criteria they gave it. If you decide to check for security or performance, it will work towards that too. But it’s the pattern that counts: clear testing, constant verification, and agents who know when they’re done.

From solo loops to hundreds of agents running in parallel, the same pattern continues to occur. It feels like something fundamental is crystallizing: autonomous AI starts to work well when you can accurately define success in advance.

Willison’s success criteria was “simple”: all 9,200 tests had to pass. That’s a lot of tests, but the agent is there. Clear success criteria made autonomy possible.

As I argued in AI makes interfaces flatter and deeper foundations, this changes where people add value:

People move towards where they set the direction at the beginning and refine the results at the end. AI takes care of everything in between.

The title of this post comes from Geoffrey Huntley. He describes software as clay on the potter’s wheeland once you have worked this way, it is difficult to think of it in any other way. As Huntley wrote, “If something isn’t right, you throw it back on the wheel and keep going.” That’s exactly what it felt like when I built my first Ralph Wiggum loop. Throw it back, refine it, run again until it’s right.

Of course, the Ralph Wiggum loop has limits. It works well if the verification is unambiguous. A unit test returns pass or fail. But not all problems come with clear tests. And writing tests can be a lot of work.

For example, I’ve been thinking about how such loops might work for Drupal, where non-technical users build pages. “Make this page more on-brand” isn’t a test you can run.

Or maybe? An AI agent can review a page against brand guidelines and approve or fail the page. It can check reading level and even perform some basic accessibility tests. The verifier does not have to be a traditional test suite. It just has to provide clear feedback.

All of this just exposes something we already understand intuitively: defining success is difficult. Really difficult. When people build pages manually, they often iterate until it “feels right.” They know what they want when they see it, but can’t always put it into words in advance. Or they hire experts who make that judgment based on years of experience. This is the part of the work that is most difficult to automate. The profession moves upstream, from implementation to specification and validation.

The question for each task becomes: Can you reliably see whether the result is getting better or worse? Where you can, the loop takes over. Where you can’t, your judgment still matters.

The border continues to move quickly. A year ago I was struggling with local LLMs to generate well alt-text for my photos. Today, AI agents build working HTML5 parsers while you watch a movie. It’s hard not to find that a little absurd. And hard not to be excited.

#Software #clay #wheel

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *