New research reveals that leading AI agents can complete less than 3 percent of freelance work tasks, earning only $1,810 out of a possible $143,991 in simulated assignments. The Remote Labor Index benchmark, developed by Scale AI, a data annotation company, and the Center for AI Safety, a nonprofit research organization, challenges widespread assumptions about AI’s readiness to replace human workers across various industries.
What you should know: The study tested several frontier AI models on real freelance tasks sourced from Upwork, spanning graphic design, video editing, game development, and administrative work.
- Manus from a Chinese startup performed best, followed by xAI’s Grok, Anthropic’s Claude, OpenAI’s ChatGPT, and Google’s Gemini.
- Tasks included complete job descriptions, necessary files, and examples of human-completed work to provide realistic working conditions.
The big picture: This research directly contradicts optimistic predictions about AI’s immediate impact on employment, including Anthropic CEO Dario Amodei’s March claim that 90 percent of coding work would be automated “within a matter of months.”
- The benchmark offers a counterpoint to OpenAI’s GDPval assessment, which suggests frontier models like GPT-5 are approaching human-level performance on 220 office tasks.
- Previous AI waves have similarly inspired misplaced job displacement predictions, such as the widely anticipated replacement of radiologists with AI algorithms.
Why AI agents struggle: Current models face fundamental limitations that prevent them from handling complex, multi-step freelance work effectively.
- “They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” explains Dan Hendrycks, director of the Center for AI Safety.
- While AI has improved at coding, math, and logical reasoning, agents still struggle to use different tools and perform tasks requiring numerous sequential steps.
Real-world implications: The timing of this research coincides with actual job cuts attributed to AI advancement.
- Amazon announced 14,000 job cuts this week, partially blaming “the rapid rise of generative artificial intelligence.”
- Beth Galetti, Amazon’s senior vice president of people experience and technology, called this AI generation “the most transformative technology we’ve seen since the Internet.”
- However, the Remote Labor Index suggests AI is unlikely to fill these vacated roles effectively.
What they’re saying: Researchers acknowledge the benchmark’s limitations while emphasizing its value for realistic AI capability assessment.
- “I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” says Hendrycks, noting that recent improvements don’t guarantee continued progress at the same rate.
- “We have debated AI and jobs for years, but most of it has been hypothetical or theoretical,” adds Bing Liu, Scale AI’s director of research.
- The researchers concede that many professions include tasks not covered by their measure, and real freelancers often use AI as a productivity-enhancing tool rather than a replacement.
AI Agents Are Terrible Freelance Workers