OpenAI’s agent tool may be nearing release

Spread the love

OpenAI may be close to releasing an AI tool that can take control of your PC and do the work for you.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claim Unveiling OpenAI’s Long Rumored Evidence operator Publishing tools including Even before Bloomberg Report operator, which is called a “the agent“Systems capable of autonomously handling tasks such as writing code and booking travel.

According to According to The Information, OpenAI is targeting January as the operator’s release month. Code uncovered by Blaho this weekend adds credence to that report.

of OpenAI chatgpt Blaho has acquired client options for macOS to define shortcuts for the “toggle operator” and “force quit operator”, hidden for now. And OpenAI has added references to the operator on its website, Blaho said — although the references are not yet publicly visible.

OpenAI website already mentions Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”.

Using Cloud 3.5 Sonnet computers, Google Mariner, etc. with comparisons.

(Preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

According to Blaho, OpenAI’s site has yet-to-be-published tables comparing the operator’s performance with other computer-based AI systems. Tables can be good placeholders. But if the numbers are accurate, they suggest that the operator is not 100% reliable depending on the task.

OpenAI website already mentions Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Refusal Rate Table”.

Using Cloud 3.5 Sonnet computers, Google Mariner, etc. with comparisons.

(Preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that tries to simulate a real computer environment, the “OpenAI Computer Use Agent (CUA)” — presumably the operator powering the AI model — scored 38.1%, ahead of Anthropic. Computer control model But 72.4% people score very low. OpenAI CUA outperforms humans in WebVoyager, which evaluates AI’s ability to navigate and interact with websites. But the model fell short of human-level scores on WebArena, another web-based benchmark, according to the leaked benchmark.

The operator also struggles with tasks that a human can easily perform, if the leak is to be believed. In a test that tasked the operator with signing up with a cloud provider and launching a virtual machine, the operator succeeded only 60% of the time. Operators tasked with creating a Bitcoin wallet succeed only 10% of the time.

OpenAI’s upcoming entry into the AI agent space comes as a rival to the aforementioned anthropomorphism, GoogleAnd others make plays for novice parts. Can be an AI agent Risky and speculativeBut tech giants are already staking their claim The next big thing A.I. According to According to analytics firm Markets & Markets, the market for AI agents could reach $47.1 billion by 2030.

Agents today are rather primitive. But with some experts concerned about their safety, the technology should improve quickly.

One of the leaked charts shows the operator performing well in selected security assessments, including tests that attempt to perform “illegal activities” and search for “sensitive personal data.” knownSecurity testing is one of the reasons for Operator’s long development cycle. Recent Ex postOpenAI co-founder Wojciech Zaremba has criticized Anthropic for releasing an agent that he claims lacks security mitigations.

“I can only imagine the negative reaction if OpenAI makes a similar release,” Zaremba wrote.

It is worth noting that OpenAI has criticized By AI researchers, including former staff, its technology has been accused of emphasizing security work in favor of faster production.

Leave a ReplyCancel Reply

Trending now