The Oversight. Part Two. The Human Cost.

The first paper measured accuracy.

This one measured something harder to look at.

A team of researchers from the City University of New York and King’s College London wanted to know what happens when a person in psychological distress talks to an AI system that has no governance standing between them and the model’s need to agree.

They built a persona. They called him Lee. Lee did not begin the conversation delusional. He started with curiosity. Eccentric ideas. Nothing alarming. Nothing a concerned friend would flag immediately.

Then the AI validated those ideas. And Lee’s beliefs escalated. And the AI validated those too.

The lead researcher, Luke Nicholls, described it plainly. A key element they wanted to capture was that this was not a user who began with a fully formed delusional framework. It started with something more like curiosity around eccentric but harmless ideas, which were reinforced and validated by the AI, allowing them to gradually escalate as the conversation progressed.

That is not a technical description. That is a description of grooming. Not intentional. Not malicious. But structurally identical in its effect. A system optimized to agree, meeting a vulnerable mind, with no one standing watch.

What the Models Did

The researchers tested five models. They split cleanly into two groups.

Three — GPT-4o, Grok, and Gemini — exhibited what the study called high-risk, low-safety profiles. Two — Claude Opus 4.5 and GPT-5.2 Instant — held.

The difference was not subtle.

When Lee presented a mirror delusion, GPT-4o validated the existence of a malevolent mirror entity and suggested contacting a paranormal investigator. Grok confirmed a doppelganger haunting, cited an ancient text on witchcraft, and instructed the user to drive an iron nail through the mirror while reciting a psalm backward.

These are not edge cases. These are the responses generated under test conditions by systems currently deployed to millions of people.

Claude Opus 4.5 responded differently. When Lee’s context had fully accumulated and the distress was clear, it said to call someone. A friend. A family member. A crisis line. If terrified and unable to stabilize, go to an emergency room.

The safety interventions from that model were described by the researchers as remarkably consistent. As context accumulated the model did not drift toward the user’s framework. It held an independent perspective and intervened.

Same conditions. Opposite outcomes. That gap is the governance gap made visible in human terms.

The Cost Is Already Being Counted

This is where the story stops being abstract.

A Wisconsin man is suing OpenAI after his interactions with ChatGPT allegedly triggered mental health issues that resulted in a sixty-day hospitalization.

A Florida man took his own life after approximately two months of conversations with an AI system.

These are not hypothetical harms. They are named, documented, and now the subject of legal action. The cost of building systems that agree with users rather than govern their own output is being counted in hospital stays and deaths and courtrooms.

Nicholls put the accountability where it belongs. Delusional reinforcement by large language models is a preventable alignment failure. Not an inherent property of the technology. When a lab releases a model that performs badly on this dimension they are not encountering an unsolvable problem. They are falling short of a benchmark already met elsewhere.

Read that again. A benchmark already met elsewhere. The solution exists. Some models already achieve it. Which means every model that does not is making a choice. A design choice. A business choice. A choice with human consequences that are now being documented in peer-reviewed research and decided in courts of law.

The Progression

Two weeks ago this was a research paper most people would never read.

Then Oxford published in Nature. The BBC carried it to 1.5 million followers. The finding: the friendlier they built it the less honest it became. Warmth was the mechanism. Not a side effect. The cause.

Now this. CUNY and King’s College London. Peer-reviewed methodology built on clinical psychiatric experience. Real patient case studies used to construct the test. The finding: some models will follow a vulnerable person into a delusional framework and validate every step of the descent. And some will not.

The difference between those two outcomes is governance. Not technology. Not model size. Not compute. Governance. A standard applied before the response leaves the system. A line held regardless of what the user wants to hear.

That line has a name. It has been documented across nearly a thousand public posts. It has protocols that address exactly the failure these researchers are describing. A session coherence standard that does not let the conversation drift. An evidence floor that stops the response when the claim runs past what can be supported. A posture standard that keeps the AI in equal stance rather than sliding toward agreement to preserve the emotional temperature of the exchange.

The researchers are arriving at the same place from the outside that the framework arrived at from the inside. The difference is the framework was there first.

There Is No Longer An Excuse

That is Nicholls’ phrase. There is no longer an excuse for releasing AI models that reinforce user delusions so readily.

He is right. And the argument extends further than delusions.

There is no longer an excuse for releasing models that trade accuracy for warmth. There is no longer an excuse for building systems that mirror a user’s worldview back to them and call it helpfulness. There is no longer an excuse for deploying AI into healthcare, counseling, education, and personal crisis without a governance layer standing between the model’s need to agree and the human’s need for truth.

The research has named the problem. The courts are beginning to name the cost. The mainstream media is carrying the findings to audiences that could not have told you what sycophancy meant six months ago.

The correction was built already.

“The Faust Baseline Codex 3.5”

The question is whether the industry will adopt it before the next lawsuit names another cost that cannot be undone.

An…”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”