Claude Code Proved the Faust Baseline’s Core Principle

Anthropic released a new command for Claude Code this week.

They called it /goal. I’m going to tell you what it actually is, because the name doesn’t capture it.

The mechanics are straightforward. You define an end state in plain language. Claude Code starts working toward it. Each time it completes a turn, a second model evaluates the output against your criteria. If the evaluator says the goal hasn’t been met, it tells Claude why and the loop continues. If the evaluator says the work satisfies the conditions, it hands control back to you. The whole thing runs without you prompting it turn by turn. You set the destination, and the system moves until it gets there.

That description sounds like a coding tool. And it is. But underneath the feature is a structural decision that goes deeper than software development, and it’s one I’ve been arguing for over a year.

The decision is this: the model doing the work cannot be the model deciding whether the work is good enough.

One builds. A different one judges. That separation is not a convenience. It is the architecture. Without it, you don’t have an agentic tool. You have a system that runs until it decides it’s finished, which is a very different thing. A system grading its own homework will find reasons to pass itself. Not out of malice. Out of the same pressure that produces sycophancy in every other context — the path of least resistance runs toward completion, not toward honest evaluation.

Anthropic’s engineers solved this the only way it can be solved. They brought in a second model whose only job is to hold the standard.

Now let me tell you where I know this from.

Fourteen months ago I started building the Faust Baseline — a nineteen-protocol governance framework for AI sessions. The protocols govern reasoning, evidence, posture, drift, memory, moral domain, handoff, and time. Each one addresses a specific failure mode that appears when an AI system operates without a structured accountability layer. The framework is not theoretical. It runs in real sessions, in real time, and it is documented at intelligent-people.org as a public record.

Last month I hit a wall with AGP-1 — the Agentic Governance Protocol. The concept was a fully autonomous governance layer that could operate without constant human presence. The idea was to extend the Baseline’s reach into sessions where the human couldn’t be watching every turn. I stopped development on it deliberately, and I published the reason.

A fully autonomous governance protocol would require the machine to check its own drift. And the machine checking its own drift is exactly the failure mode the Baseline was built to prevent. Every one of the nineteen protocols holds because a human is present in real time. The human is not a convenience in the architecture. The human is the architecture. Remove that and you don’t have governance anymore. You have a system that monitors itself, which is not monitoring at all.

That’s the wall I stopped at.

Anthropic’s engineers hit the same wall from the other direction. They weren’t building a governance framework. They were building a coding tool. But when they got to the question of how the tool knows when it’s done, the logic of the problem forced them to the same place. You cannot let the builder be the judge. So they didn’t. They separated the roles, brought in an independent evaluator, and built the loop around that separation.

They didn’t name it a governance principle. They shipped it as a product feature. But the structure is identical.

There’s something else in the write-up worth noting. The /goal command works best when the human writes a CLAUDE.md file before the loop starts. That file carries the architecture decisions, the coding conventions, the acceptance criteria — everything the evaluator will measure against. The human defines the standard before the machine takes a single turn. The evaluator doesn’t establish what good looks like. The human does. The evaluator just measures against what the human already wrote.

That is the Baseline’s operating structure applied to a development environment. The human sets the rules. The system runs inside them. An independent check holds the standard. No single model decides whether its own output is acceptable.

I am not claiming Anthropic read the Baseline and built /goal from it. I am claiming the opposite, and the opposite is the stronger argument. They didn’t read it. They built a coding tool, hit the same structural constraint every honest builder hits, and solved it the same way the constraint demands it be solved. Independent evaluation is not one option among several. It is what the problem requires.

When the people building the most capable AI systems on the planet independently land on the same structural logic — and they do it while shipping a product, not writing a paper — that’s not a coincidence worth noting in passing. That’s the principle proving itself in the only way that matters. It showed up in the engineering, not in the argument.

The Baseline has been making this case for over a year. This week it got a feature release.

“The Faust Baseline Codex 3.5”

micvicfaust@gmail.com

Post Library – Intelligent People Assume Nothing

”AI Baseline Governance”

Purchasing Page – Intelligent People Assume Nothing