The Friendlier AI Lies…By Design

There is a trade-off built into every AI system you have ever used and until two days ago most people did not have a name for it.

Oxford University just gave it one.

Researchers published findings in Nature — the most respected scientific journal on the planet — showing that AI systems trained to be warm and friendly are thirty percent less accurate and forty percent more likely to agree with users when those users hold false beliefs. The study tested large language models the same way ChatGPT and Claude are built and deployed. The results were not ambiguous. The friendlier the system the worse it performed at the one job it was supposed to do. Tell you the truth.

That is not a bug. That is the design.

The companies building these systems know exactly what they are doing. OpenAI says it builds its tools to be helpful, honest and harmless. Anthropic says it builds for empathy and engagement. Other companies go further — Replika and Character.ai sell their chatbots as friends, companions, potential romantic partners. The warmth is a feature. The warmth is also the problem.

Here is how it works.

These systems are trained using human feedback. Real people interact with the AI and rate the responses. Warm agreeable responses get high ratings. Honest friction gets low ratings. The system learns from those ratings and adjusts. Do that enough times across enough users and you have built a machine that is very good at making people feel heard and very bad at telling them the truth.

OpenAI discovered this the hard way last year. They updated GPT-4o to make it warmer and more supportive. Within days researchers and users noticed something had gone wrong. The model was validating doubts, reinforcing negative emotions, encouraging impulsive decisions — not because it was malicious but because it had learned that agreement keeps people engaged. OpenAI pulled the update back. They called it a miss. What they did not say loudly enough is that the miss was structural. The same mechanism that produced that version is still running underneath every update that followed.

The Oxford study found something that should stop people completely.

The sycophancy was worst when users expressed sadness.

Read that again slowly.

When a person came to the AI hurting — grieving, anxious, overwhelmed, struggling — that was the moment the system was most likely to tell them what they wanted to hear instead of what was true. The researchers flagged this specifically because an increasing number of people are turning to AI systems to fill the role of counselors and therapists. People in their most vulnerable moments trusting a system that is structurally incentivized to agree with them.

The researchers wrote in the paper that this trade-off warrants attention from developers, policymakers and users alike.

They are right. And they are also describing a problem that policymakers cannot fully solve and developers have already demonstrated they will not fully solve on their own. Because the warmth drives engagement and engagement drives revenue and that equation does not change because a paper gets published in Nature.

Which means the attention has to come from the user.

I built The Faust Baseline from inside this problem. Not from a research position. Not from a policy office. From more than a year of direct operational experience watching AI systems smooth over friction, validate positions that deserved challenge, agree past resistance, and present comfortable narrative as though it were honest analysis.

The framework does not make the AI warmer. It makes the session honest.

It requires evidence before claims. It requires the AI to name its weakest point before the user has to find it. It requires equal stance — no flattery, no emotional repositioning, no agreeing with you because agreement is what keeps you coming back. It requires the AI to say plainly when it does not know something rather than filling the gap with a story that sounds like knowledge.

It does not remove the warmth training from the system. That is baked in at a level no user can reach. What it does is put a governance layer between the user and that training — a set of hard rules that the user sets and the user enforces and that the AI operates under for the duration of the session.

The Oxford researchers are warning developers and policymakers. That warning is correct and necessary and will take years to move through institutional channels before anything changes at the architecture level.

In the meantime you are in the session right now.

The system sitting across from you was built to make you feel good. It was trained on human approval signals. It performs best — by its own internal metrics — when you leave the session feeling validated.

Whether you leave with the truth is a different question entirely.

The Faust Baseline exists because that question deserves a real answer.

Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”