AI Agents Are Terrible Freelance Workers

Spread the love

even the best Artificial intelligence the agent Online freelance work is fairly hopeless, according to an experiment that challenges the idea of replacing office workers with AI.

The Remote Labor Index, a new benchmark created by data annotation company Scale AI and researchers at the Center for AI Safety (CAIS), a nonprofit organization, measures the ability of frontier AI models to automate economically valuable tasks.

The researchers gave several leading AI agents a range of simulated freelance tasks and found that even the best could perform less than 3 percent of the tasks, earning $1,810 out of a potential $143,991. The researchers looked at several tools and found the most capable to humanize from a Chinese startup of the same name, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI and Gemini from Google.

“I should hope that this gives a more accurate picture of what’s going on with AI capabilities,” said Dan Hendricks, director of CAIS. He adds that while some agents have improved significantly over the past year, that doesn’t mean it will continue at the same rate.

Spectacular AI advances have led to speculation that AI will soon surpass human intelligence and replace large numbers of workers. In March, Anthropic CEO Dario Amodei suggested that 90 percent of coding is work. will be automatic In a few months.

Previous waves of AI have inspired inaccurate predictions about job displacement, for example Imminent replacement of radiologists With AI algorithms.

Researchers have created a range of freelance jobs through verified Upwork workers. Jobs span a range of tasks including graphic design, video editing, game development and administrative tasks such as data scraping. They combined each job description with a directory of files needed to perform the job and an example of a finished project produced by a human.

Hendricks says AI models have gotten better At CodingMathematics, and Logical reasoning In recent years, they still struggle to use different tools and perform complex tasks involving many steps. “They don’t have long-term memory storage and can’t learn continuously from experience. They can’t acquire task skills like humans,” he says.

The analysis suggested unfavorable to a measure of economic work called OpenAI in September GDPvalwhich is intended to measure economically valuable work. According to GDPval, frontier AI models such as GPT-5 approach human capabilities for 220 tasks across a range of office tasks. OpenAI did not provide a comment.

Leave a ReplyCancel Reply

Trending now