Anthropic Gives Claude AI Power to End Conversations in Extreme Cases

The move isn’t about user safety, it’s about protecting the AI’s “welfare.”

Emmanuella Madu
2 Min Read

Anthropic has introduced a controversial new safeguard for its largest AI models: the ability to end conversations in rare, extreme cases of abusive or harmful user interactions.

Unusually, the company says the feature isn’t designed to protect users, but rather the AI itself. While stressing that its Claude models are not sentient, Anthropic acknowledged “high uncertainty about the potential moral status of Claude and other LLMs, now or in the future.” As a precaution, it has launched a program studying “model welfare” and implementing low-cost interventions in case such welfare ever proves relevant.

The new capability currently applies only to Claude Opus 4 and 4.1, and is reserved for “edge cases” such as requests involving child sexual abuse material or attempts to solicit instructions for mass violence. In testing, Anthropic says Claude displayed a “pattern of apparent distress” when asked to respond to such content.

The company emphasized that Claude can only end a chat after multiple failed redirection attempts or at a user’s explicit request. It will not use the feature in cases where users appear at risk of self-harm. Even when a chat has ended, users can start fresh conversations or continue previous ones by editing their responses.

“We’re treating this feature as an ongoing experiment and will continue refining our approach,” Anthropic said.

Share This Article