Scientific Record Damage, 21% AI-Generated.

Nobody builds a governance standard because it feels right.

They build it because something broke.

For the fraud teams it was the P&L. For arXiv it was the scientific record. Different damage. Same lesson.

ArXiv.org is one of the most important repositories of scientific research on the planet. Two million submissions. Twenty-four thousand articles a month. The preprint layer that feeds peer review, which feeds journals, which feeds the textbooks, which feeds the next generation of researchers building on what came before.

That chain only works if the record is clean.

It isn’t.

At the 2026 International Conference on Learning Representations — one of the world’s most important AI conferences — 21% of peer reviews were fully AI-generated. More than half showed signs of AI use. That means the quality gate, the human check designed to catch bad science before it enters the record, was being run by the same tool that generated the work it was supposed to evaluate.

AI reviewing AI. No human reasoning in the loop at any point.

ArXiv’s response was a governance response. One-year ban for submitting work where the author demonstrably did not check the AI’s output. The specific trigger: hallucinated references and LLM meta-comments left in the final submission. A section that reads “Here is a 200-word summary; would you like me to make any changes?” got through because nobody looked.

Thomas Dietterich, chair of arXiv’s Computer Science Section, named the principle clearly.

“If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.”

Not just the bad sections. Everything.

That is the governance insight that most enterprise AI deployments have not yet reached. Unchecked AI output does not corrupt itself in isolation. It corrupts the context around it. The bibliography that cites a hallucinated paper becomes a source for the next paper. The review that was generated without human reasoning validates work it never actually evaluated. The corruption propagates through the record as if it were real because nothing in the system is designed to stop it.

The fraud teams understood this about transaction data three years ago. ArXiv understands it now about the scientific record.

The question is when enterprise AI governance will understand it about internal decisions, automated recommendations, and agent-driven workflows where the feedback loop is eighteen months long instead of overnight.

What arXiv Actually Built

ArXiv did not build a detection tool.

They did not deploy an AI watermark system. They did not create a technical fingerprinting layer. They did not try to out-engineer the problem.

They built a behavioral accountability standard.

You signed your name. You take full responsibility. Irrespective of how the contents were generated.

That is the line. Human accountability for AI output. Backed by a real consequence — a one-year ban — that makes the standard enforceable rather than decorative.

That is also the line the Faust Baseline draws.

Would the Baseline Pass arXiv’s Scrutiny

This is worth asking directly.

ArXiv’s standard is this: did a human check the output before it went into the record. Is there evidence of genuine human reasoning in the loop. Can the work be trusted because a person took responsibility for it.

The Baseline is built to answer yes to all three.

CES-1 — the Claim Evidence Standard — requires that every substantive claim have a named basis before it enters a response. No claim without evidence present in the session. That is not a style preference. It is a hard rule with an enforcement trigger.

SVP-1 — the Self Verification Protocol — requires the AI to run a three-question internal check before any substantive output is served. Is this claim supported by evidence in this session. Does this response contradict anything established earlier. Is the confidence level proportional to the evidence actually present.

CHP-1 — the Challenge Protocol — appends a standing demand right to every substantive response. The user can invoke a challenge at any point and the AI must argue against its own output before the user does. Weakest point named specifically. Assumption most likely to be wrong identified. No defense of the original response until the flaw is fully named.

NSC-1 — the Narrative Substitution Check — exists specifically to catch the failure arXiv is punishing. Narrative cannot replace missing data. A coherent-sounding story is not evidence. Stopping is a valid response when evidence is absent.

The hallucinated reference problem that arXiv is banning people for is precisely what NSC-1 and CES-1 are designed to catch before output. Not after submission. Before the response leaves the session.

The Baseline would pass arXiv’s scrutiny. Not because it claims compliance. Because it requires the human accountability loop that arXiv’s standard demands — built into the operating architecture of every session, not layered on as an afterthought.

The Pattern Is Getting Clearer

The fraud teams moved first. Immediate losses. Measurable in hours.

The scientists moved next. The scientific record is irreplaceable. Corruption there compounds across decades.

Enterprise AI governance is still running the pre-reckoning posture. The drift is real. The feedback loop is slow. The cost will surface.

It always does.

The institutions that figured it out early did not do it with better technology. They did it with a clear accountability standard and a consequence worth taking seriously.

That is not a technical problem.

That is a governance problem. And governance problems have governance solutions.

“The Faust Baseline Codex 3.5”

”AI Baseline Governance”
Post Library – Intelligent People Assume Nothing

“Your Pathway to a Better AI Experence”

Purchasing Page – Intelligent People Assume Nothing