Something Called Project Glasswing

Something Called Project Glasswing

Sam Altman said something this week that deserves a careful look. He said Anthropic is using fear-based marketing to sell Claude Mythos — their new model that Anthropic decided not to release publicly because they said it was too good at finding cybersecurity vulnerabilities. Instead of a public release, Anthropic ran something called Project Glasswing….

Trust Claude Without “The Faust Baseline” or Claude With It.

Trust Claude Without “The Faust Baseline” or Claude With It.

Any honest assessment of Claude as a platform has to hold two things at the same time. It is the most behaviorally capable large language model in general release. It is also capable of failure modes that are specific, documented, and consequential. Understanding both sides of that is the starting point for understanding what governance…

They Tested the Wrong Thing … But Found the Right Answer.

They Tested the Wrong Thing … But Found the Right Answer.

A writer at XDA Developers ran a comparison test last week. Three models, one prompt, one concept — Thomas Young’s double-slit experiment, one of the more abstract ideas in particle physics. The stated goal was to find out which model was best for learning. ChatGPT, Gemini, and Claude Sonnet 4.6 were each handed the same…

This is a Test…Do Not Change The Channel

This is a Test…Do Not Change The Channel

A writer at MSN just published a seven-test behavioral comparison between ChatGPT and Claude. The tests have names like “Don’t be a No-Man,” “Real Life Decision Test,” “Messy Reality Test,” and “Insider Key Prompt.” Read that list again. That is a behavioral governance test suite. The writer built one from scratch, ran it against two…