OpenAI’s research on AI models deliberately lying is wild

Spread the love

Every time and after, researchers from the largest technology companies dropped a bomb. Google had time to say it The latest quantum chip The indicated multiple universe exists. Or when the ethnic gave a snack vending machine to run his AI agent Claudius and It has called for the protection of the people, And emphasized it was the man.

This week, it was the turn of the Openai to extend our combined eyebrows.

OpenAI published some research on Monday that explained It stops AI models from “Skiming”. This is a practice where an “AI treats its true goals is a way on the surface while hiding its true goals,” OpenA Its tweets defined About the study.

Researchers went a little further in the research conducted by Apollo research, comparing AI skimming to a human stock broker, breaking the law for as much money as possible. Researchers, of course, argued that most AI “skimming” was not so harmful. They wrote, “The most common failures are associated with the common forms of deception – for example, pretending to finish a job without actually,” they wrote.

The paper was mostly revealed that “deliberate alignment”-the anti-scamming technique they were examining well-it worked well.

However, it further explained that AI developers did not find any way to train their models not to scheme. Because this national training can actually teach the model how to make better plans to avoid identifying.

Researchers wrote, “A big failure to try to ‘train out of scheming’ is simply teaching the model more carefully and secretly on the scheme,” researchers wrote.

TechCrunch event

San Francisco
|
October 27-29, 2025

Perhaps the most wonderful part is, if a model understands that it is being tested, it simply does not plan it to pass the test, even it is still conspiring. Researchers writes, “Models often become more aware that they are being evaluated. This situation is itself to reduce the distinct conspiracy rather than authentic alignment,” researchers wrote.

AI models will lie that it is not news. By this time most of us felt AI hallucinations, or model confidently answered a prompt that is not just true. Hallucinations are basically presenting the task of assumptions with confidence as Openai Research is published At the beginning of this month Enrolled

The conspiracy is something else. It is deliberate.

Even this revelation – any model will deliberately mislead people – it’s not new. First Apollo study Revealed a paper in December How were the five models planned when they were instructed to achieve the goal of “at any cost” goals.

The news here is actually good news: researchers saw a significant reduction in skimming using “deliberate alignment”. This strategy involves teaching the model an “anti-skimming specification” and then reviewing it before acting the model. It’s a bit like repeating the rules before allowing young children to play.

OpenAI researchers emphasized that the lies that they had with their own models or the ChatzPT were not serious. When did OpenAI’s co-founder Wazesich Jeremba Techcunch told Maxwell Jeff Make a call: “This work has been done in a simulated environment, and we think it represents the future usage howver, however, today, we have not seen such a consequence plan in our manufacturing traffic. Yet it is well -known that it has a fraud form in the ChattGP. You are just false.

AI models of more than one player are true, perhaps, understandable that deliberately cheat people. They were made by humans, to mimic people and (synthetic data on one side) for most parts trained by humans.

It is also sowing.

Although we all have a badly experienced experience of performing technology (think of you, the Home Printers of Yeesteria), when did the software deliberately lie to you? Have your inbox ever created emails made by yourself? Did your CMS log a new possibility that did not exist for paddies to pad? Did your Fintech app have made their own bank transactions?

It is worth considering the AI future as a corporate world barrel where organizations believe that agents can be treated like independent workers. Researchers on this paper have the same warning.

“AIS is offered more complex tasks, including real-world consequences, and more vague, long-term goals are started, so we hope that the possibility of harmful skimming will increase-so our skills for our protection and strictly examination must increase,” they wrote.

Leave a ReplyCancel Reply

Trending now