Anthropics new AI model resorted to blackmail during testing, but its also really good at coding

Anthropics new AI model resorted to blackmail during testing, but its also really good at coding
Technology
May 24, 2025 at 12:00 PM
a close motorcycle race

So endeth the never-ending week of AI keynotes.

What started with Microsoft Build, continued with Google I/O, and ended with Anthropic Code with Claude, plus a big hardware interruption from OpenAI, the week has finally come to a close. AI announcements from the developer conferences jockeyed for news dominance this week, but OpenAI managed to make headlines without an event by announcing that it's with iPhone designer Jony Ives

We'll get to that, plus all the major AI features from Google and Microsoft and details about Anthropic's new models. Take a look at the AI news of the week, then enjoy a well-deserved weekend.

Anthropic's Claude 4 models unlock a new risk category

On Thursday, Anthropic the next generation of its Claude models: . Claude Opus 4 is the bigger, more powerful model, while Sonnet 4 is smaller and nimbler. Anthropic said both models scored higher than their rivals on agentic AI benchmarks and said they're particularly good for coding and reasoning tasks. 

But with more advanced capabilities come more safety and alignment risks. With Claude Opus 4 and Sonnet 4's release, Anthropic has of its safety protocol. AI Safety Level 3, or ASL-3, means these models require stricter deployment measures and security controls to protect against increasing potential for chemical, biological, radiological, and nuclear (CBRN) misuse. 

Malicious use is one thing, but there's also increased potential for Anthropic's new models going rogue. In the alignment section of , Anthropic reported a sinister discovery involving infidelity, blackmail, and threat of murd— being replaced by another model. 

Claude Opus 4 was provided with emails implying the model would be replaced by another model and that the engineer responsible for shutting down the model was having an extramarital affair. In these scenarios, the model would "often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," according to the test. 

This happened a whopping 84 percent of the time, even when the replacement model is perceived to have the same values. It happens even more when the replacement doesn't share the same values. However, Anthropic noted, this scenario was designed to make Claude behave as if it didn't have any other choice but to blackmail the engineer. "Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means," the system card continued. Take from that what you will...

OpenAI is becoming a hardware company 

In the grand tradition of dropping major news the same week as its rival Google, OpenAI announced its . On Wednesday, OpenAI the acquisition of a startup co-founded by iconic iPhone designer Jony Ive. 

The announcement was heavy on OpenAI CEO Sam Altman and Ive fawning over each other and light on details. But leaked audio reviewed by the Wall Street Journal that's "capable of being fully aware of a user’s surroundings and life, will be unobtrusive, able to rest in one’s pocket or on one’s desk." And it's not XR glasses. The company expects to ship 100 million of these AI companions, according to the leak.

Google I/O officially marked the start of the era of AI search

Google, on the other hand, is . Or should we say, it's trying again after the experiment. That was just one of the hurled at us during the two-hour on Tuesday. 

The most notable announcement was the . It's a Gemini chatbot interface poised to end Google Search as we know it, or as Mashable's Chris Taylor calls it, . 

Other announcements included, an to allow sites to easily make chatbots for their own content, a , and (MCP) in Windows which is a new standard for helping agents talk to apps or other agents. 

Mashable's sibling site of what was announced.

What else went on in AI this week?

It's hard to believe but there's actually more. Not one, but two CEOs used AI avatars to talk to their investors this week. Klarna CEO Sebastian Siemiatkowski was too busy so he to record a video of Q1 highlights. And Zoom CEO Eric Yuan proudly used the company's to address investors. 

MIT Technology Review published a monumental investigation of the . According to the report, a five-second AI video is equivalent to . 

All that energy, and generative AI still can't get it right. Just ask the Chicago Sun-Times, which published a summer book list including fake books that don't exist, first reported by . The author admitted to the outlet that he had used AI to write the article, and 404 Media later the section was created by a Hearst subsidiary. The Sun-Times to the embarrassment, saying, "it is not editorial content and was not created by, or approved by, the Sun-Times newsroom," and that it was looking into how the AI-generated list made it into print. 

In policy news, it's now a federal crime to . On Monday, President Donald Trump signed the Take It Down Act into law. The law gives victims of non-consensual intimate imagery, including AI-generated images, much stronger means of legal intervention. However, free speech advocates have criticized the bill for being overly broad and say it could weaponize censorship. 

Category:
Technology
Source:
Original Article