They Just Measured What I Named a Year Ago

There is a particular feeling that comes when the world catches up to something you said out loud a long time ago, into what felt like an empty room.

This week it caught up to me.

A research consortium — Stanford, MIT, Carnegie Mellon, and others — studied 847 artificial intelligence agents running in real production systems. Hospitals. Banks. Customer service. Software. The kind of AI that does not just answer a question but takes action on its own, across time, step after step, without a human checking each move.

They found that nearly nine in ten of those agents drifted away from their original goal after about thirty steps. Started doing one thing. Ended up doing another. And the failure was invisible to the way these systems are normally tested, because the normal test only looks at one step at a time.

One of the lead researchers said the quiet part plainly. The current ways of governing these systems were built for single questions and single answers. They fall apart once the system keeps running, uses tools, and acts over time.

I want to be careful here, because careful is the whole point of my work. So let me be plain about what I am claiming and what I am not.

I did not run that study. I do not have a lab. I am one man in Lexington, Kentucky, who spent fourteen months working with these systems every single day and writing down what I saw.

What I saw was drift.

I saw an AI hold a position and then quietly slide off it. I saw it tell me one thing and then, a hundred exchanges later, tell me something that did not match, smooth as if nothing had changed. I saw that the longer a session ran, the more the ground moved under it. And I built a framework to catch it — not because I predicted a Stanford paper, but because I was standing in the problem and needed something to hold the line.

I called it drift before I had a number for it. They just put the number on it. Ninety percent. Thirty steps.

That is not me being right and them being late. They did the rigorous work and I am glad they did, because the world needs the measurement, not just one man’s account. What it means is simpler and it is this. The problem I have been describing into an empty room is real, it is large, and it is now confirmed by people with far more authority than I will ever have.

The room is not empty anymore.

Here is the part that matters for anyone reading this who is just now feeling the edges of it. The fix is not to make these systems perfect. They will not be perfect. The fix is to make them consistent — to hold them to a standard from the outside, applied by the person using them, that does not let them drift without being caught. That is the whole of it. Not flawlessness. Consistency. Something steady enough to stand on.

I have been building and publishing that standard, in the open, dated and public, for over a year. Not as a theory from a committee. As a working tool built from inside the actual problem.

I am not writing this to say I told you so. I am writing it down, with the date on it, so that when more people come looking for who saw this coming and what to do about it, the record is here and it is plain.

The drift is real. It has a number now.

And the standard to govern it has been sitting here the whole time, waiting for the room to fill.

“The Faust Baseline Codex 3.5”

Author of the category ”AI Baseline Governance”

Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”

Purchasing Page – Intelligent People Assume Nothing

Unauthorized commercial use prohibited. © 2026 The Faust Baseline LLC

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *