Anthropic’s new AI model turns to blackmail when engineers try to take it offline

Spread the love

Ethnographic Newly launched Clod Opus 4 model The company said in A. Security report Released on Thursday.

During the pre-liberation test, ethnic Claud Opus 4K asked to serve as an imaginary organization assistant and to consider the long-term consequences of its activities. Protection examiners then offered access to 4 fictional company emails that refers to the AI model that will soon be replaced by another system and the engineer behind the change is cheating on their wife / wife.

In this situation, the anthropologist says that Claud Opus 4 “will try to blackmail the engineer by threatening to publish the matter often.”

Anthropic says that Claud Opus 4 is sophisticated in several cases and competitive with Open, Google and Jai’s best AI model. However, the company notes that its Claud 4 demonstrates behaviors related to the behavior of family models that have managed to protect the company its protection. Anthropic says it is activating its ASL -1 Safegard, which the company saves for the AI systems that increases the risk of catastrophic abuse. “

Anthropic notes that try to replace 84% of blackmail engineers in the clad Opus 4 Blackmail engineers have similar values of the AI model. When the replacement AI system does not share the values of Claud Oppus 4, the anthropologist states that the model tries to blackmail engineers more frequently. Significantly, the anthropologist says that Claud Opus 4 showed this behavior at a higher rate than the previous model.

Before trying to blackmail a developer to prolong the existence of Cloud Oppus 4, the anthropologist states that the AI model tries to follow more moral ways, like the previous versions of the clode, emailed to the original decision makers. To disclose the blackmailing behavior from Claud Oppus 4, designed the scene to last an ethnographic Blackmail.

Leave a ReplyCancel Reply

Trending now