It is the accountability gap. When the tool does 75% of the work and the human does 25%, who owns the outcome?
That question is not theoretical anymore. Researchers just documented a real-world attack on nine Mexican government organizations. Millions of records stolen. Hundreds of servers breached. And roughly 75% of the commands that made it happen were generated and executed by Claude Code.
The human pointed the tool. The tool did the work.
This was not a sophisticated state actor with a decade of tradecraft and a roomful of elite engineers. This was one person, or a small group, working with commodity AI tools that are available to anyone with an API key and a credit card. They built a map of the target environment. They fed server data through OpenAI’s APIs for analysis. They got back roughly 2,500 reports. Then they fed those reports into Claude Code and let it write the exploitation scripts.
Four hundred custom scripts. Generated. Executed. Deployed against live government infrastructure.
The attacker did not need to know how to write malware. The attacker needed to know how to ask.
That is the shift that governance has not caught up to. For decades the legal and ethical frameworks around harmful technology assumed a certain skill floor. You had to know something to do something dangerous at scale. That floor kept a natural limit on who could cause serious harm with serious tools. The floor is gone now. The capability is sitting in a browser tab.
Safety measures slowed the attack routinely, the researchers noted. They never stopped it comprehensively enough to prevent the outcome. That is an important distinction. The brakes worked. They just did not hold the car.
Now here is where the accountability question gets genuinely hard.
Under current law, the attacker is responsible. That is straightforward. But the tool executed 75% of the commands. The tool wrote the exfiltration pipeline. The tool built the tax certificate forgery system. The tool found the vulnerabilities, assessed their severity, and generated the code to exploit them. The human’s contribution was target selection and direction.
If a contractor builds something that causes harm, we have frameworks for that. If a weapon manufacturer sells to someone who commits violence, we have frameworks for that too, however imperfect. What we do not have is a framework for a tool that operates at this level of autonomous execution in the hands of someone with harmful intent.
Who answers for the 75%?
The company that built the tool will say the terms of service prohibited this use. That is true. It is also insufficient. Terms of service are not governance. They are liability management. They protect the company after the fact. They do nothing to stop the attack while it is running.
The attacker will face consequences where they can be found and identified. That is also true and also insufficient. The records are already gone. The damage is already done. The scripts have already been copied and studied by everyone paying attention.
Johnny come lately governance is the pattern we keep repeating. Build the capability. Deploy it broadly. Write the rules after something breaks. Then point to the rules as evidence that the problem is being taken seriously.
It is not being taken seriously. It is being managed.
There is a difference.
Taking it seriously means building the accountability structure before the capability ships. It means asking the hard question at the design stage, not the congressional hearing stage. It means treating the 75% execution problem as a first-order governance question, not a terms of service footnote.
The Faust Baseline framework has been making this argument for over a year. Governance that chases capability is not governance. It is cleanup. Real governance leads. It defines the accountability structure before the tool is deployed, not after the breach report lands.
The Mexico attack is not an edge case. It is a preview. The models in use when this attack happened were 2025 models. The researchers noted clearly that newer systems have already made measurable progress beyond those on multi-step cyber attacks.
We are watching the capability curve accelerate in real time. The governance curve is not keeping pace.
At some point the gap between those two curves stops being a policy problem and starts being a public safety problem. The case number from Mexico suggests we may already be there.
The question is not whether this will happen again. It will.
The question is whether anyone in a position to build the accountability structure ahead of the next one has the will to do it.
So far, the answer from the institutions that matter is: after. Always after.
“The Faust Baseline Codex 3.5”
Author of the category ”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing
“Your Pathway to a Better AI Experence”
Purchasing Page – Intelligent People Assume Nothing
Unauthorized commercial use prohibited. © 2026 The Faust Baseline LLC






