Researchers from Zhejiang University and OPPO AI Center have published the most comprehensive survey to date of “OS Agents”—AI systems that can autonomously control computers, mobile phones, and web browsers by directly interacting with their interfaces. The 30-page academic review, accepted for publication at the Association for Computational Linguistics conference, comes as major tech companies including OpenAI, Anthropic, Apple, and Google race to deploy AI agents capable of performing complex digital tasks, while highlighting significant security vulnerabilities that most organizations aren’t prepared to address.
The big picture: This technology represents a fundamental shift toward AI systems that can genuinely understand and manipulate the digital world like humans do, moving beyond simple chatbots to agents that can execute multi-step workflows across different applications.
- Over 60 foundation models and 50 agent frameworks have been developed specifically for computer control, with publication rates accelerating dramatically since 2023.
- Current systems work by taking screenshots of computer screens, using computer vision to understand what’s displayed, then executing precise actions like clicking buttons, filling forms, and navigating between applications.
- The most sophisticated systems can handle complex workflows that span different applications—booking a restaurant reservation, adding it to your calendar, then setting a traffic reminder.
Major players racing to market: Tech giants have moved with unprecedented speed to transform academic research into consumer-ready products.
- OpenAI recently launched “Operator,” while Anthropic released “Computer Use” capabilities.
- Apple introduced enhanced AI capabilities in “Apple Intelligence,” and Google unveiled “Project Mariner.”
- All these systems are designed to automate computer interactions by observing screens and executing actions across mobile, desktop, and web platforms.
Security nightmare scenario: The researchers document alarming attack vectors that could compromise enterprise systems in ways traditional security models aren’t designed to handle.
- “Web Indirect Prompt Injection” allows malicious actors to embed hidden instructions in web pages that can hijack an AI agent’s behavior.
- “Environmental injection attacks” use seemingly innocuous web content to trick agents into stealing user data or performing unauthorized actions.
- An AI agent with access to corporate email, financial systems, and customer databases could be manipulated by a carefully crafted web page to exfiltrate sensitive information.
- “Studies on defenses specific to OS Agents remain limited,” creating an immediate challenge for organizations considering deployment.
Performance reality check: Despite the hype, current systems show significant limitations that temper expectations for immediate widespread adoption.
- Success rates vary dramatically across different tasks and platforms, with some commercial systems achieving above 50% success on certain benchmarks while struggling with others.
- Systems excel at simple, well-defined tasks but falter with complex, context-dependent workflows that define much of modern knowledge work.
- They can reliably click buttons or fill standard forms but struggle with tasks requiring sustained reasoning or adaptation to unexpected interface changes.
The personalization challenge: Future OS agents will need to learn from user interactions and adapt to individual preferences over time, presenting both enormous opportunities and privacy risks.
- “A personal assistant is expected to continuously adapt and provide enhanced experiences based on individual user preferences,” the researchers write.
- This capability could create AI agents that learn your email writing style, understand your calendar preferences, and make increasingly sophisticated decisions on your behalf.
- The technical challenges include developing multimodal memory systems that can handle text, images, and voice while avoiding comprehensive surveillance of users’ digital lives.
What they’re saying: The researchers emphasize both the transformative potential and the urgent need for better security frameworks.
- “The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations,” they write. “With the evolution of (multimodal) large language models ((M)LLMs), this dream is closer to reality.”
- “OS Agents can complete tasks autonomously and have the potential to significantly enhance the lives of billions of users worldwide. Imagine a world where tasks such as online shopping, travel arrangements booking, and other daily activities could be seamlessly performed by these agents.”
- The researchers acknowledge that “OS Agents are still in their early stages of development” with “rapid advancements that continue to introduce novel methodologies and applications.”
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...