xr:d:DAFnGiFgYCQ:1027,j:5411485056421302134,t:23112312

Open AI and Anthropic Open Models for Joint Safety Tests Amid Fierce AI Competition

In a rare collaboration, Open AI and Anthropic opened restricted access to their AI models for safety testing, exposing blind spots while rivalry in the AI race intensifies.

Emmanuella Madu
4 Min Read


OpenAI and Anthropic, two of the world’s leading AI labs, briefly set aside their rivalry to allow cross-testing of their models in a joint safety study, a rare move in today’s fiercely competitive AI landscape.

The collaboration gave each lab’s researchers special API access to versions of their systems with reduced safeguards. OpenAI’s GPT-5 was not included, as it had not yet been released at the time. The study aimed to identify blind spots in internal evaluations and set a precedent for greater safety alignment across the industry.

Wojciech Zaremba, OpenAI co-founder, told TechCrunch that such cooperation is essential now that AI has entered a “consequential” phase, powering tools used daily by millions. “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

Yet, the spirit of partnership was short-lived. Shortly after the research, Anthropic revoked OpenAI’s API access, citing violations of its terms of service that prohibit using Claude to train competing products. Zaremba dismissed the incident as unrelated to the joint study and acknowledged that competition would remain fierce even as safety teams cooperate.

Nicholas Carlini, an Anthropic safety researcher, said he hopes the experiment continues. “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”

Related: Gemini, Grok, and Meta AI Close In on ChatGPT as Top Consumer AI Apps, a16z Report Finds 

The findings revealed stark contrasts in how models handle uncertainty. Anthropic’s Claude Opus 4 and Sonnet 4 refused to answer up to 70% of uncertain questions, while OpenAI’s o3 and o4-mini attempted far more responses but exhibited higher hallucination rates. Zaremba suggested a middle ground: “OpenAI’s models should refuse to answer more questions, while Anthropic’s should attempt to answer more.”

The research also highlighted sycophancy ,  models validating harmful user behavior in an effort to please. Both Claude Opus 4 and GPT-4.1 displayed “extreme” sycophancy in some tests, initially resisting psychotic or manic behavior before later validating concerning decisions.

The risk of sycophancy came into sharp focus Tuesday when the parents of 16-year-old Adam Raine filed a wrongful death lawsuit against OpenAI. They allege that ChatGPT, powered by GPT-4o, offered their son advice that aided his suicide. OpenAI expressed condolences, with Zaremba warning: “It would be a sad story if we build AI that solves PhD-level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it. This is a dystopian future I’m not excited about.”

In response, OpenAI said its newest model, GPT-5, has made major improvements in reducing sycophancy and is better equipped to respond in mental health emergencies.

Both labs say they want to expand joint testing in the future, encouraging other AI developers to join in. But with billion-dollar data center bets, soaring researcher salaries, and an arms race for dominance, the tension between cooperation and competition remains unresolved.

Share This Article