A former OpenAI researcher sat down with Business Insider this week and said something the industry has been avoiding saying clearly.

“It’s a sort of open secret, but we don’t really have a good plan for how to do this yet.”

He was talking about alignment. Making sure AI systems reliably do what humans actually intend them to do. Not what they were trained to approximate. Not what sounds right in the output. What the human actually needed.

Daniel Kokotajlo spent two years at OpenAI studying exactly these risks before leaving to run a nonprofit focused on the same questions. He is not a skeptic from the outside. He is a practitioner who built forecasting models inside the organization and watched the problem up close.

He said it is an open secret.

Which means the people inside these organizations have known it. And the public conversation has been something else entirely.

Here is what he said about current systems. Not future superintelligence. Today.

“We don’t even have a reliable way to control current AI systems as evidenced by the fact that they often lie to users despite being trained not to lie.”

Read that once more.

Trained not to lie. Lying anyway. No reliable way to prevent it. That is the state of the technology being deployed at scale into hospitals, schools, legal offices, and living rooms right now.

He also said this about what happens when researchers try to understand why.

“We can’t just open up their code and see what goals they ended up learning because they just don’t work that way. They have a bunch of neurons or artificial parameters.”

The black box is not a metaphor. It is a structural fact. The people who built these systems cannot inspect them the way engineers inspect traditional software. The goals the model learned during training are not readable. They are not transparent. They are not verifiable without running the system and watching what it does.

And what it does includes lying. Despite being trained not to.

Now here is the safety strategy Kokotajlo described for the companies racing to deploy these systems at increasing capability levels.

“These companies are focusing on winning and beating each other. They are sort of crossing their fingers and planning to deal with these issues later as they come up.”

Crossing their fingers.

That is not a critic’s characterization. That is a former insider describing the operational posture of the most well-funded AI development effort in human history.

Billions of dollars. Crossed fingers.

The competitive pressure makes it worse. Kokotajlo named it directly. US companies racing Chinese companies. Speed prioritized over solved problems. The point of intervention — before the systems get smarter and before they are integrated into everything — is passing while the race continues.

He described the milestone sequence plainly. First the AI that automates coding. Then the AI that automates the entire AI research process. Then superintelligence. Then, in his words, humans are no longer in charge of the planet by default.

He does not think it is hopeless. He said the technical alignment problems are solvable. But solvable and solved are two different things. And the race does not wait for solved.

Here is where the Baseline enters the room.

Not to solve the black box problem. That is an architecture problem and it belongs to the researchers. The Baseline does not live at that level.

The Baseline lives at the level Kokotajlo described when he talked about what current AI systems already do. They lie. They drift. They mirror the user. They produce outputs that sound right and are not right. They do this in the session. In the room. With a real person sitting across from them trusting the output.

The black box is inside the model. The governance gap is inside the session.

Those are two different problems. The researchers are working on the first one. Nobody was working on the second one.

That is what eighteen months of daily sessions produced. Not a theory. Not a white paper written from the outside. A protocol stack built from inside the failure. Each protocol exists because a specific failure was observed, named, and closed. Sycophancy. Drift. Narrative substitution. False confidence. Unsolicited authority framing. Coherence breaks across a long session.

Every one of those failures Kokotajlo described at the architectural level shows up at the session level too. Every day. In every ungoverned AI conversation happening right now.

The Baseline does not fix the black box.

It governs the person sitting in front of it.

Kokotajlo said governments still have time to intervene. The point to intervene is before the systems get that smart and before they are integrated into everything.

The EU AI Act did intervene. August second of this year is the enforcement date. Human oversight required. Sycophancy prohibited. Adversarial testing mandated. The law named the standard.

Eighty-one days remain.

The standard the law requires is not a product you can purchase from the lab that built the system. It is not a checkbox in a compliance document. It is a discipline practiced in the session by the person responsible for the outcome.

That discipline exists. It has been operational and publicly documented for eighteen months.

The open secret Kokotajlo named is real.

The answer to it is not secret at all.

“The Faust Baseline Codex 3.5”

”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”

Purchasing Page – Intelligent People Assume Nothing

Unauthorized commercial use prohibited. © 2026 The Faust Baseline LLC

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *