They Proved It When I Built It “The Faust Baseline”

I saw it and built a working platform for it before anyone had the formula.

Sitting in long AI sessions, working through governance problems, I would feel the session shift. Not dramatically. Not all at once. The responses would still sound reasonable. The language would still be smooth. But something underneath had moved. The thread had frayed. The AI was still talking — and had stopped being reliable.

I called it drift. I built a framework to contain it.

Now two researchers at the University of Miami have derived an exact mathematical formula for when it happens and why.

They call it the Jekyll-and-Hyde tipping point.

The paper is from Neil F. Johnson and Frank Yingjie Huo, submitted to arXiv in April 2025. The finding, stated plainly: there is a precise, predictable moment when a large language model’s output tips from coherent to wrong, misleading, irrelevant, or dangerous. The cause is attention — the mechanism inside the model that tracks context and relationships across a response. When a session grows complex enough, attention spreads too thin. Then it snaps.

Not gradually. Not with warning signs a user can easily read. It snaps.

The researchers say this formula requires only secondary school mathematics to follow. They say it can predict when the tipping point occurs, and how it can be delayed or prevented by changing the prompt or the training itself.

They also note, without softening it, that deaths and trauma have already been blamed on LLMs behaving this way.

I want to sit with that for a moment.

Deaths. Trauma. Blamed on systems that tipped mid-response and nobody — not the user, not the platform, not the governance layer — had a formula to predict it or a mechanism to catch it.

That is not a future risk. That is the present record.

Here is what I built the Faust Baseline to do.

Not to prevent the tipping point at the architecture level. That is an engineering problem and it belongs to the people building the models. What the Baseline addresses is the operational layer — what happens inside the session when the user is relying on an AI system and has no native way to know whether the response they just received came from before or after the snap.

RTEL-1 — the Real Time Enforcement Layer — fires on confirmed violations and stops the response. It does not wait for the user to notice something has gone wrong.

CHP-1 — the Challenge Protocol — appends a standing demand right to every substantive response. The user can invoke it at any time and the AI must argue against its own output before the user accepts it as final.

SVP-1 — the Self Verification Protocol — requires three internal questions answered before any substantive output is served. Is this claim supported by evidence present in this session? Does this response contradict anything established earlier? Is the confidence level proportional to the evidence actually present?

These protocols exist because I experienced the tipping point before it had a name. I did not have a formula. I had observation. I had the discipline to build containment architecture around what I was seeing.

The formula confirms the observation was accurate.

The researchers make one more point worth naming directly.

They write that this uncertainty is pushing people to treat their AI more politely — hoping that courtesy will discourage the system from turning on them. People are being kind to their tools out of fear.

That is not a healthy relationship with a technology. That is a symptom of ungoverned dependence.

The answer is not politeness. The answer is governance. Clear standards. Enforced protocols. A framework that operates whether the user remembers to be careful or not.

That is what the Baseline is.

The science is catching up. The researchers now have a formula. Policymakers have a platform for clear discussion, they say. The public has a starting point for understanding what these systems actually do inside a long session.

What the formula cannot do is govern the session in real time. It can predict the tipping point. It cannot catch the snap when it happens to you, on your device, in your session, with your data and your decisions at stake.

That is the gap the Baseline lives in.

They proved the problem is real.

The operational response already exists.

“The Faust Baseline Codex 3.5”

”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”