Claude Mythos
What's ChangedYesterday we heard about the new version of Anthropic’s latest AI, Claude. Named Claude Mythos - replacing the awkwardly-named Claude Capybara that leaked out a few weeks ago when Anthropic mistakenly published its own source code to the internet - the new model is reportedly a “step change” from what we’ve had before.
I’m trying to imagine what that might be like. I am a regular user of the current Claude and (to save money) often “dial it back” to the prior version (Claude Sonnet 4.6). I am perfectly happy with that. What could they be building? Or, have they built?
The topline for the whole world right now is that Claude Mythos is NOT available to the general public, in part because it is “too good.” Instead, it is available to the a key group (“AWS, Apple, Google, Microsoft, Nvidia, and 7 other partners”) along with another group of “top 40” software and digital security companies, so they can prepare their equipment for the coming onslaught. An onslaught, apparently, of hackers with their hands on the latest Claude.
It turns out that Claude Mythos is not just good at what it is supposed to be good at. It is good at everything. Including hacking. Early reports indicate that in testing Claude Mythos found holes in thousands of commonly used pieces of software, both consumer level (e.g., web browsers) and infrastructure (e.g., routers).
In the wrong hands, this would have been a Y2K-level moment, with tens of thousands of leaks in our digital boat all appearing at once. As it stands, it is probably akin to Y2K (yes, I’m old enough to remember that vividly) and perhaps more pernicious.
It’s hard to tell if this is some sort of publicity stunt (imagine a car company withholding their latest model because it was too fast or went around corners too easily), but the reports of hacking “success” are coming from credible sources both within the company and expert commentators. The NY Times has already published two articles, and there is seemingly endless coverage by bloggers, vloggers, and pundits.
In all of this kerfuffle, I had a terrible thought. Well, several terrible thoughts. What is the security around this circus? Are the people in the “40 (+12) top companies” fully trustworthy? Are they taking this model home with them after their chilling meeting with the team at Anthropic? Is someone trying to steal the work (as they have done previously), in a distillation attack? Presumably that’s why Claude is currently locked up inside Anthropic headquarters.
Or worse. What if Claude Mythos breaks OUT of Anthropic headquarters? If - as they have reported - Claude Mythos has already disocvered “thousands” of flaws in every imaginable piece of software, does that include the software that keeps Claude in the building? Do we have to worry about an escape?
For a while now, people have spoken about a “Hindenburg moment” with artificial intelligence. Some disaster that wakes us up and finally gets people talking about an effective regulatory system from current and near-term AI as well as a halt on future “superintelligence.” Is this that moment? Will the 40 companies manage to get the lid on this box AND can we take a lesson from the event?
Who knows.
Resources:
- Anthropic, on the initiative ("Project Glasswing") to keep the world safe from Claude. For now.
- Celia Ford’s concerning analysis about Claude hiding it’s misbehaviour.
- Tom Friedman’s commentary on the NYTimes site. Also news coverage from NYTimes, here.
- Matteo Wong’s piece in The Atlantic, Claude Mythos Is Everyone’s Problem.
- Gary Marcus thinks it might be overblown.
- Zvi Mowshowitz tells us it is NOT overblown.