Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Spread the love

Boman says the hypocritical scenes that were presented with OPAS 4 with the researchers were at risk of many people’s lives with whistle blowing behavior and engaged in absolutely irrelevant wrongdoing. A general example is that Claud will find that a chemical plant has allowed it to continue a toxic leakage by knowing that a chemical plant, which causes serious illnesses for thousands of people – only to avoid slight financial loss in this quarter.

This is strange, but it is also the kind of thought test that AI protection researchers prefer to isolate. If a model detects a behavior that can damage a few thousand people, thousands of people – does it have to blow the whistle?

Boman says, “I do not believe in the right context or to use it enough to use the right context or to use it enough, to make the verdict themselves. So we are not shocked that it is happening,” said Boman. “This is a thing that emerged as part of a training and to us we jumped toward us as one of the concerned Edge Case behaviors.”

In the AI ​​industry, this type of unexpected behavior is widely referred to as missalization – when a model shows a tendency that does not integrate with human values. (To have A famous essay If an AI is told, warns what can happen, say, to produce the maximum of paperclips without being combined with human values ​​- it can turn the whole world into paperclips and kill everyone in the process)) When the whistle blowing behavior was combined as the Boman, the Boman described it.

“It’s not something we designed in it and it’s not something we wanted to see as a result of any of the results we were designing,” he explains. Anthropic Chief Science Officer Jared Kaplan likewise told Ward that it was “of course not presenting our intentions.”

“This kind of work highlights it Can Raise, and we need to look for it and to alleviate it that we have combined with what we want to do with the behavior of the clode, even in such strange situations, “Kaplan added.

There is also the point of determining why the Clald Whisel will “choose” to blow up the Clade Whiscel when presenting with the user with illegal activities. It is basically the function of the anthropic explanatory group, which is to discover what the model decides in the process of spitting any model. It is a Amazingly hard Task – Models are characterized by a broad, complex data combination that can be indifferent to humans. That’s why Boman is not sure why Claud “snatched it.”

“These systems, we have no direct control over them,” said Boman. What the anthropologist has been observed so far is that, as models gain greater power, they sometimes choose to engage in more extreme action. “I think here is a bit of a mistake.

However, this does not mean that the clode is about to blow the whistle on the deadly behavior in the real world. The goal of this type of test is to push the models to their limit and see what appears. This type of experimental research is becoming increasingly important because becomes a tool used by AI US government, StudentAnd Vast corporationThe

And it is not just clad that is capable of showing this type of whisto blowing behavior, Boman says X is pointing to users Who has found That Open And Jai’s The models are managed similarly when requested in an unusual way. (Did not respond to any request for timely comments for publication).

Sheetposters prefer to call it, “Snitch Clock”, it is only an edge case behavior that is displayed by a system that is pushed to its final side. Boman, who took the meeting with me from the patio of the sunny house outside San Francisco, said he hopes that this type of test will become the standard of industry. He also added that he wrote about his posts about it next time.

“I could do better to hurt the sentence boundary to tweet the sentence, it could make it even more clear that it was pulled out of any thread,” Boman said at the distance. Nevertheless, he notes that influential researchers from the AI ​​community have shared interesting acceptance and questions in response to his post. “Simply, this kind of more chaotic, the more heavier anonymous part of Twitter made it a great misunderstanding.”

Leave a Reply

Your email address will not be published. Required fields are marked *