They Told You The Safeguards Don’t Exist Anthropic, Claude Mythos, and the governance gap hiding in plain sight
This morning I published a piece about BadHost.
A vulnerability in foundational AI infrastructure. Millions of agentic systems exposed. The patch existed. The door stayed open anyway.
The finding from the Edinburgh study three days ago was that the real threat isn’t criminal genius. It’s builders who ship without governing what they built.
Today the story got bigger.
Today the builder told you directly.
What Anthropic Said
In an update to Project Glasswing — Anthropic’s limited-access security program — the company announced plans to release a public version of Claude Mythos, its advanced cybersecurity AI model.
They also said this.
No company, including Anthropic itself, has built safeguards sufficient to prevent the model from being misused.
Read that again.
Not a critic. Not a regulator. Not a competitor.
Anthropic.
The company building the model. Saying publicly, in their own update, that the safeguards don’t exist yet. And that they intend to release it anyway — first to U.S. and allied governments, then to the broader public — in what they describe as the near future. Their own blog post predicts that Mythos-class models will become widely available within six to twelve months.
What This Model Does
Claude Mythos is not a general-purpose AI assistant with enhanced capabilities.
During testing it generated working exploits 72.4 percent of the time.
Claude Opus 4.6 — the current frontier model — generates working exploits just above zero percent of the time.
Anthropic researchers with no formal security training asked Mythos to hunt for vulnerabilities overnight. They woke up to complete, functional exploits.
Since its initial reveal in April 2026 Mythos has scanned over a thousand open-source projects. It identified more than 23,000 flaws. More than 6,000 of those were rated high or critical severity.
Among them was a vulnerability in the wolfSSL cryptography library — a tool used by billions of devices — that would have allowed attackers to forge certificates and impersonate legitimate websites.
That flaw has been patched. The 22,999 others are in a queue.
Open-source maintainers have reportedly asked Anthropic to slow its disclosure rate. The flood of bug reports exceeds their capacity to address them. Anthropic’s suggested solution to the capacity crunch is more AI.
The Governance Statement Hidden In The Announcement
Anthropic deserves recognition for saying out loud what most companies in this position would not say.
Acknowledging publicly that the safeguards don’t exist is more honest than the standard corporate posture of releasing first and discovering the gaps later.
But honesty about a known gap is not governance.
What Anthropic has done is name the unlocked door, describe exactly how dangerous the room behind it is, and announce a timeline for opening it anyway.
That is a governance decision. It has consequences that extend well beyond Anthropic’s systems, their government partners, and their internal red teams.
It extends to every open-source maintainer already overwhelmed by the current disclosure rate. Every organization running infrastructure that Mythos-class models will be pointed at. Every agentic system connected to MCP that will interact with environments where Mythos-class capabilities are operating.
The Edinburgh study found that poorly secured agentic systems are the most pressing emerging risk. BadHost confirmed that agentic systems are already the primary attack surface for live vulnerabilities. Mythos is an AI purpose-built to find and exploit vulnerabilities at a scale and speed no human security team can match.
The question that the announcement does not answer is straightforward.
Who governs what happens between the model’s capability and the safeguards that don’t exist yet?
The Session Level Problem Returns
There is a governance layer beneath the release decision that nobody in the coverage is discussing.
When Mythos-class models interact with enterprise systems — through APIs, through MCP connections, through agentic workflows — the outputs they produce will arrive looking like normal AI outputs. Vulnerability reports. Security recommendations. Exploit assessments.
The organizations receiving those outputs will have no session-level mechanism to know whether the reasoning that produced them was operating inside disclosed constraints or not. Whether the model was functioning as intended or had been manipulated through the kind of Host header attack that BadHost demonstrated. Whether the output represents genuine security analysis or constrained output dressed as free reasoning.
Anthropic’s system card for Mythos predicts that AI tools will ultimately benefit cybersecurity defenders.
They also note that hackers may have an advantage for the time being.
For the time being.
That phrase is doing a lot of work in a sentence describing a model that generates functional exploits 72% of the time and whose safeguards don’t yet exist.
The Three-Part Sequence
Three days ago the Edinburgh study found that the threat is builders who ship without governing what they built.
Two hours ago BadHost confirmed that foundational agentic infrastructure is already carrying live high-severity vulnerabilities that patches haven’t reached.
Now Anthropic has announced a model purpose-built for exploit generation and acknowledged in the same breath that the safeguards to govern it don’t exist.
This is not three separate stories.
This is one story in three acts.
Act one: the threat model was wrong. The criminal isn’t the primary risk. The builder is.
Act two: the infrastructure is already exposed. The door is already open in systems running today.
Act three: a tool capable of finding and generating exploits at machine speed is coming to the public market within six to twelve months. The company releasing it has told you directly that nobody has built the safeguards yet. Including them.
What Governance Looks Like Here
This is not an argument that Mythos should not exist. Defensive cybersecurity applications of this capability are real and significant. Finding 23,000 vulnerabilities including a critical flaw in a library used by billions of devices is genuinely valuable work.
The argument is that capability without governance is not a product. It is a liability.
Governance at the release level means safeguards that exist before the door opens. Not after the first misuse incident. Not after the capacity crunch overwhelms open-source maintainers. Before.
Governance at the session level means every interaction with a Mythos-class model carries disclosure about what constraints are operating, what the model can and cannot do, and what conditions are shaping the output the user receives.
Governance at the infrastructure level means the agentic systems that Mythos-class models will interact with are not running the equivalent of BadHost vulnerabilities in their foundational frameworks.
All three layers are required. Right now none of them are confirmed to exist at the scale Anthropic is describing.
The Statement That Matters
Anthropic said it.
No company, including itself, has built safeguards sufficient to prevent the model from being misused.
That statement is the governance gap.
Not the model. Not the capability. Not the timeline.
The gap between what the model can do and what exists to govern it.
That gap has a name now. It has a CVE number from BadHost. It has a study from Edinburgh that predicted it. It has an announcement from Anthropic that confirmed it.
The door is being built in public.
The lock is still on the drawing board.
“The Faust Baseline Codex 3.5”
Author of the category ”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing
“Your Pathway to a Better AI Experence”
Purchasing Page – Intelligent People Assume Nothing
Unauthorized commercial use prohibited. © 2026 The Faust Baseline LLC






