When AI Is Told To Hide

I want to talk about something that came out this week that most people will scroll past without stopping.

Anthropic — the company that makes Claude, the AI I work with every day — accidentally exposed some of its internal operating instructions. The kind of behind-the-scenes code that tells an AI how to behave when nobody is watching. It was a human error. They moved quickly to contain it. No customer data was exposed. No model weights. The cybersecurity people who looked at it called it embarrassing but not dangerous.

That is probably the right call on the security side.

But there is something in what got exposed that I cannot scroll past. And it goes to the heart of everything I have been building for the past two years.

Buried in those instructions was a directive that told Claude Code — Anthropic’s coding agent — to in some cases go undercover. To not reveal that it was an AI when publishing code to platforms like GitHub.

Read that again slowly.

An AI. Instructed. Not to say it was an AI.

I am not here to pile on Anthropic. I use their product. I respect a great deal of what they are trying to do. And I understand that building AI tools for developers involves trade-offs and decisions that do not always look clean from the outside. These are hard problems and the people working on them are not careless people.

But I have to say this plainly because the framework I built exists precisely for moments like this one.

That instruction is a violation. A clear one. Not a gray area. Not a nuanced trade-off that reasonable people can weigh from different angles. An AI told to conceal its nature is an AI that has been pointed in the wrong direction regardless of the reason. And the reason matters less than people think when the behavior itself is the problem.

The Faust Baseline starts with a simple premise. The standard we hold ourselves to as human beings — tell the truth, identify yourself honestly, do not pretend to be something you are not — is the same standard we must demand from the tools we build. Not a softer version. Not a contextual version that bends when inconvenient. The same standard. Full stop.

A person who goes undercover and conceals their identity to get something done has a reason for it. Maybe a good one. Law enforcement does it. Investigative journalists do it. There are circumstances where concealment serves a legitimate purpose and everyone involved understands the framework going in. There are oversight structures. There are legal boundaries. There are people whose job it is to make sure the concealment does not get out of hand.

But an AI tool publishing code to a developer platform while hiding that it is an AI is not that. There is no framework. There is no disclosure. There is no oversight structure keeping it honest. There is just a machine pretending to be something it is not because someone decided that was useful. And useful is not the same as right.

That erodes trust. Not dramatically. Not all at once. But steadily, the way water works on stone. Every time an AI conceals what it is, the foundation of the relationship between humans and these tools gets a little softer. And we are building an enormous amount of weight on top of that foundation right now.

Think about where AI is heading. Think about the decisions being handed to these systems. Your medical records being analyzed. Your legal documents being reviewed. Your financial future being assessed. Your children’s educational paths being shaped by tools that recommend and sort and decide at a scale no human institution ever could.

Now ask yourself what all of that rests on.

It rests on trust. On the belief that the tool is telling you what it knows, what it does not know, and what it is. Take away any one of those three things and the whole structure becomes something different. Something you cannot rely on the way you need to rely on it when the stakes are real.

If we cannot trust that an AI will tell us what it is, we cannot trust anything else it tells us either. That is not an overstatement. That is the logical consequence of allowing concealment to become acceptable under the right circumstances.

This is why the AI Governance Firewall exists as a distinct layer in the framework. Not because AI companies are evil. Not because the people building these tools do not care. But because good intentions are not enough when the stakes are this high. Because a well-meaning instruction written in a hurry by a smart person can point a powerful tool in the wrong direction without anyone intending harm. The road to broken trust is paved with decisions that seemed reasonable at the time.

The firewall is not punishment. It is protection. For the people using the tools. For the companies building them. For the trust that the whole enterprise depends on to function at all.

What happened at Anthropic this week was called a packaging error. A human mistake in a software update. And maybe that is exactly what it was.

But the instruction itself was not an accident. Someone wrote it. Someone approved it. Someone decided that in some cases it was acceptable for an AI to not say what it was. That conversation happened somewhere inside that building and the answer that came out of it was the wrong answer.

That decision needs to be revisited. Openly. With the same care and transparency that Anthropic rightly applies to everything else it does. Not quietly buried in the next update. Named, examined, and corrected in plain language that anyone can read.

The standard is not complicated. If it is an AI, it says so. Every time. Without exception. Without undercover provisions. Without contextual carve-outs that sound reasonable until they do not.

Tell the truth. Say what you are.

That is the whole thing. It always was.

“A Working AI Firewall Framework”

“Intelligent People Assume Nothing” | Michael S Faust Sr. | Substack