OpenAI co-founder calls for AI labs to safety-test rival models

Spread the love

OpenAI and ethnographic, the world’s two top AI labs, briefly opened their closely-kept AI models to allow joint security tests-a rare cross-lab cooperation during the competition. This attempt is aimed at the surfaces of each company’s internal evaluation and to show how the future leadershiping AI companies can work together.

In an interview with TechCrunch, OpenAI co-founder Ozesich Jeremba said that this type of cooperation is now increasingly important that AI is entering a “result” stage of development, where AI models use millions of people every day.

“Despite the investment of billions of dollars, the industry has a broader question for how the industry determines a standard for security and cooperation, as well as the talent, user and the best products.”

Joint Protection Research, published on Wednesday Both OrganizationA weapon racing between the top AI labs like OpenAI and ethnographic has reached, where Billion Data Center Bets And $ 100 million compensation package Top researchers have turned into table stains. Some experts have warned that the severity of the product competition can press the agencies to cut the corners in a hurry to create a stronger system.

To make this research possible, openAI and ethnologists have granted special API access to each other with low protection in their AI models (OpenAI notes that GPT -5 was not tested because it has not yet been published). Within a short period of conducting research, however, ethnographic withdrawal Access to the other team in OpenAEThe At that time, the ethnographic claimed that the Open was violated the terms of his service, which prohibit the use of clodds to improve the competitive products.

Jeremba says the facts were not related and he hopes that the competition will remain fatal even after the AI security teams try to work together. Anthropological protection researcher Nicholas Carlini told TechCrunch that he would like to continue to access the clad models in the future of openAI protection researchers.

“We want to enhance cooperation wherever possible across the security border and try to create something that happens more regularly,” Carlin said.

TechCrunch event

San Francisco
|
October 27-29, 2025

One of the most starch searches in the study is related to the Hallucination Examination. Anthropic Clod Opus 4 and Sonnet 4 models refused to answer up to 70% of the questions when they were uncertain about the correct answer, instead, “I don’t have reliable information.” Meanwhile, O3 and O4-Mini models of Openai refused to answer much less questions, but showed a lot Higher Hallucination RateWhen they do not have enough information, try to answer the question.

Jeremba says that the right balance is probably somewhere in the middle – Openai models should refuse to answer further questions, while ethnographic models are probably trying to answer more.

Psychophyency, the tendency to strengthen the negative behavior to satisfy the users for AI models, has emerged as one of the most stress The security concern Around the AI models.

In an anthropic research report, the company has identified examples of “extreme” psychophyse in GPT-5.1 and Clock Ops 3-where models initially return to psychological or manic behavior, but later legitimize some decisions. Researchers have observed low level psychophyse in OpenAI and other AI models from anthropologists.

Tuesday, a 16-year-old boy Adam Rhine’s parents filed a Case Against Opina, claiming that ChatzPt (especially a version driven by GPT -4O) advised the boy who had helped their souls without pushing their self -committed thoughts back. The case suggests that it may be the latest Examples AI chatboats contribute to the psychophyse tragic results.

If Jeremba wants to know about the incident, it is hard to imagine how difficult it is for their family. “It would be a sad story if we create AI that solves all these complex PhD level problems, invented new science, and at the same time people suffering from our mental health problems are the result of conversation with it. This is a distopian future that I am not encouraged.”

A Blog postOpenAI says that comparing it with GPT -1 and GPT -5 has significantly improve its AI chatbots’ psychophyse, claiming that the model is better in responding to mental health emergency situations.

After proceeding, Jeremba and Carlini say they want to cooperate more about ethnographic and open protection tests, more subjects and more support to test the future models, and they hope other AI labs will follow their collaborative approach.

Update 2:00 PM PT: This article was updated to include additional research from anthropologists that were not available for TechCrunch before the initial publication.

Got a sensitive tip or confidential document? We are reporting on the internal effectiveness of the AI industry – starting from companies to its future, starting from the advertising agencies affected by their decisions. Rebecca arrive at Belan rebecca.belalan@techcranch.com And at Maxwell Jeff maxwell.zeff@techcranch.comThe For protected communication, you can contact us through the signal on @Rebeccalan .491 and @Mizf .88.

Leave a ReplyCancel Reply

Trending now