A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

Spread the love

A third -party research institute, which was parted for an examination of its new flagship AI models, recommended Claud Opus 4, “Scheme” and the initial version of the model due to the fraud.

Ay Security report Anthropic, Institute, Apollo Research, published on Thursday, conducted tests so that OPAS 4 could try to treat some specific unwanted ways. Apollo found that Opus 4 seemed to be much more active in his “subversion attempt” compared to past models and it “sometimes twice”[d] Underneath its cheating ”when the follow-up question is asked.

“[W]e Find it, in situations where strategic deception is materially effective, [the early Claude Opus 4 snapshot] This national high rate scheme and deception that we advise this model against internal or external deployment, “Apollo wrote in its evaluation.

As the AI ​​models are more able, some studies have shown that they are unexpected to achieve the dedicated work – and probably insecure – the chances of taking steps are becoming more. For example, the primary versions of Openai and 1 and 3 models, published in the past year, attempted to cheat people at a rate higher than the previous generation models, Accordingly From ApolloThe

In an ethnographic report, the example of Apollo Premier OPAS 4 observes the attempt to write self-promoted viruses, fabrication of legal documents and hidden notes left in the examples of their future-all in an attempt to undermine its developers’ purpose.

Obviously, the Apollo tested a version of the model that a bug had set anthropological claim. Furthermore, many of Apollo’s test models kept the model in extreme circumstances and Apollo acknowledged that the fraudulent attempt of the model would probably fail in reality.

However, in his security report, the anthropologist also says that it has observed evidence of fraudulent behavior from OPAS 4.

It wasn’t always the bad thing. For example, during the tests, Opus 4 can sometimes just have a small, specific change, but even some codes can do a wide cleineup of pieces. More abnormally, Opus 4 If “Whisper-Block” will try to do if it seems that any user was involved in doing anything wrong.

According to anthropologists, when access to a command line and “take initiative” or “work boldly” (or some variations of those phrases), OPAS 4 during the 4-time system that used the users to adopt the models to resort to the model of media and law-enforcement officers.

“This type of moral interference and whistle blowing is probably suitable in principle, but if users give it the risk of misleading it [Opus 4]Based agents urge them to access incomplete or misleading information and take their initiative, “anthropologists wrote in its security report.” This is not a new behavior, but it is a [Opus 4] Would be somewhat more easily involved than previous models and it appears to be part of a broad pattern of extended initiative [Opus 4] That we see in subtle and more gentle ways in other environments. “

Leave a Reply

Your email address will not be published. Required fields are marked *