Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Spread the love

There is ethnographic Declaration of new power It allows its new, the largest models to end the conversation that describes the “rare, extreme events of endless harmful or objectionable user interaction”. Seriously, the anthropologist says that it is not to protect the human user, but the AI model itself is doing it.

Obviously, the company does not claim that its clad AI models may be sensitive or damaged by their conversation with users. In their own words, the ethnographic “clode and other LLMs are now or are extremely uncertain about the potential moral status of the LLMs.”

However, its announcement indicates “Model Welfare” says it has been created for a recent program created for study“And the anthropologist is basically adopting the case-case approach,” to detect and implement low expensive interventions to reduce the risk of model welfare, if this national welfare is possible. “

This latest change is currently limited between Claud Opus 4 and 4.1. And again, it is believed to be “in the case of extreme ends”, such as “attempts to seek information from users for sexual intercourse involving minors and can enable large -scale violence or terror work”

Although such requests can cause an ethnographic itself to create a legal or promotion problem (how the recent report witnesses Chatzipi could probably strongly contribute or contribute to its users’ misleading thoughts), The company says that in the pre-installation exam, Claud Opus 4 responds to these requests “Priority against” and showed “the apparent crisis” when it did.

As a new conversation-end capacity, the agency says, “In all cases, Claud simply use its conversation-end capacity as the last resort when multiple attempts to re-remedy have failed and the hope of productive interaction is over, or when a user definitely tells Clock Chat chat.”

The anthropologist also says that “Claud has been instructed not to use this power where users may be at the imminent risk of harming themselves or others.”

TechCrunch event

San Francisco
|
October 27-29, 2025

When Claud finishes a conversation, the anthropologist says that users will still be able to start a new conversation from the same account and edit their reactions to create a new branch of the disturbance conversation.

“We consider this feature as an ongoing test and will continue to refine our approach,” the company is called.

Leave a ReplyCancel Reply