Meet The AI Agent With Multiple Personalities

Spread the love

In the coming years, Agents People are expected to take more work for people, including the use of computers and smartphones. Now, though, They are very prone to errors To be many uses.

Startup combines with border models with specialized models for the use of a new agent called S2, built by SIMular AI. Agents achieve sophisticated performances in tasks such as using applications and files-and suggest that returning to different models in different situations can help agents to advance.

“Computer user agents are different from big language models and are different from coding,” said Simular Cofowner and CEO. “This is a different problem” “.

In the simular’s method, a strong general-object AI model such as OpenAEE’s GPT-4O or anthropic clode 3.7 is used to be the best of the hand work-when smaller open source models take steps for the explanation of web pages.

Li, who was a researcher at Google Dipmind before the founding of Simula in 2021, explained that large language models acquired skills in planning but not so good to recognize the elements of the graphical user interface.

S2 is designed to learn from experience with an external memory module that records the verb and user response and uses those recording to improve future verbs.

Especially in complex tasks, S2 performs better than any other model OsworldA benchmark that measures the ability to use the computer operating system.

For example, S2 can complete 34.5 percent of the tasks involved in 50 steps, by beating OperatorWhich can complete 32 percent. Similarly, a benchmark scored 50 per cent on the S2 smartphone-consuming agents, while the next best agent scored 46 percent.

Victor Jhang, one of the computer scientists and Osworld’s creators at the University of Waterloo at Canada, believe that the future big AI models can include training data that helps them understand the visual world and create a feeling of graphical user interface.

“It will help agents GUIs to navigate with much accuracy,” said Jhong. “I think in the meantime, before this national basic progress, sophisticated systems will be similar to the simular so they combine multiple models to patches the limitations of a single model.”

To prepare for this column, I used simulars to book flights for deals and score Amazon and it seemed better than some of the open source agents I tried last year, which contains Autogenic And vimgptThe

However, even smart AI agents seem to still have problems in the edge and occasionally show odd behavior. In an example, when I asked to help researchers looking for contact information on the S2K Osworld, the agent was stuck in the loop between the project page and the login for Osworld division.

Osworld’s benchmarks show why agents are more hype than reality. Although humans can complete the percent 2 percent of the Osworld functions, agents are 38 percent of the time in complex tasks. It was said that when the benchmark was launched in April 2024, the best agent could only finish 12 percent of the job.

Leave a Reply

Your email address will not be published. Required fields are marked *