Inside Scaled Cognition’s APT-1 AI Agent building platform
With benchmark-leading performance, $21M in funding from Khosla Ventures, and a novel approach to AI agent development, this Berkeley professor-led startup might have cracked the code for practical enterprise AI
While awaiting hands-on access to Scaled Cognition’s platform, our research reveals what may be one of this year’s most significant enterprise AI developments. Led by UC Berkeley AI professor and CTO Dan Klein, this startup backs its bold claims with impressive benchmark results and an efficient development approach.
Their newly announced APT-1 system leads major agentic benchmarks, including Tau-Bench and ComplexFuncBench. These benchmarks test an AI’s ability to handle complex API sequences and comply with business policies—crucial capabilities for real-world enterprise applications. Most remarkably, a US-based team achieved this for under $11 million, a fraction of typical AI development costs.
In an industry driven by funding headlines, Scaled Cognition’s backing is telling. Khosla Ventures led their $21 million seed round in 2023, with Vinod Khosla joining the board. In the often-hyped AI startup world, the involvement of one of Silicon Valley’s most discerning investors signals strong technological potential.
Klein’s platform introduction emphasizes practical AI implementation: “It’s focused on actions not tokens so it can obey your business logic better and it’s a specialist, fast and compact.” This statement reveals their distinctive approach. While competitors chase larger language models and better token prediction, Scaled Cognition pursues business utility.
The technical architecture of APT-1 breaks from conventional AI approaches through three innovations: optimization for actions rather than tokens, focusing on business operations instead of language prediction; a fully synthetic agentic data pipeline requiring no human-labeled data; and a revolutionary reinforcement learning approach using agent-to-agent self-play, similar to techniques that mastered Chess and Go.

Through their Agent Builder platform and GenAPI technology, companies can build, test, and deploy specialized AI agents within an hour—without integrating with real APIs during development. This dramatically reduces implementation risk and complexity. The platform functions as a safe “flight simulator” for AI systems, letting businesses validate implementations before touching real customer data or transactions.
Their synthetic training data approach solves a persistent AI development challenge. Instead of using web-scraped or enterprise data, which often lack connections between conversations and actions, they’ve built a data pipeline that generates precisely the grounded data needed for agent training. This eliminates a major bottleneck: the scarcity of high-quality training data combining conversational elements with associated actions.
The business implications are significant. Financial services companies could create loan-processing AI agents that maintain strict compliance. Healthcare providers could deploy agents managing appointments and follow-up care within HIPAA guidelines. Retail businesses could implement AI for complex returns while following company policies—all with reduced development time and risk.
Their capital efficiency is remarkable. While AI development typically requires hundreds of millions in investment, their benchmark-leading performance with just $11 million suggests a fundamentally more efficient approach.
Their self-play reinforcement learning system marks another advance. Though proven in games with clear win/loss conditions, Scaled Cognition has adapted it for business applications, using simulated agent-to-agent interactions to teach systems proper action execution while respecting policies. This could transform how businesses automate complex processes while maintaining compliance.
For developers, the platform promises significant advances in AI implementation. Immediate code example interpretation would be groundbreaking. Testing implementations without touching production systems could substantially reduce development time and risk.
As we await hands-on testing, we’re keen to see how APT-1 handles real-world edge cases and complex business logic. Key questions remain: Will synthetic training data translate to real-world scenarios? How will agent-to-agent self-play learning apply to complex business processes?
For business leaders monitoring the AI space, Scaled Cognition’s approach offers a promising direction. If successful, their platform could fundamentally change how businesses adopt AI—making it more practical, less risky, and better aligned with business needs.
We’ll provide a detailed hands-on review upon accessing the platform. Meanwhile, with benchmark-leading performance, innovative technology, and strong financial backing, Scaled Cognition stands out in the crowded AI landscape.
Recent Blog Posts
Anthropic Shipped Claude Channels. Your AI Agent Can Now Text You Back.
Until very recently, every interaction with an AI agent had the same shape. You sit down. You open the tool. You give it a task. You wait. You check. You iterate. Every cycle requires your presence. Walk away and the session stalls, the output piles up unseen, or a permission prompt freezes everything until you come back. That constraint just changed. On March 20, 2026, Anthropic shipped a feature called Claude Code Channels. It lets Claude's agentic tool communicate with you through Telegram, Discord, and iMessage. You send a task from your phone. Claude does the work on your computer....
Apr 13, 2026What Did You Do Today?
There's a saying in Jackson Hole. You hear it at the coffee shop on the square, on the chairlift at the Village, in the bars after a day on the mountain. It goes like this: It's not what you do. It's what you did today. I've been thinking about that line all weekend. Because Sam Lessin dropped a piece arguing that AI isn't just a labor crisis — it's a meaning crisis. And Goldman Sachs just published 40 years of data proving that when technology displaces workers, the damage doesn't heal. It scars. Ten percent slower earnings growth for the...
Apr 3, 2026Claw-code Broke GitHub’s Star Record in 24 Hours. Two Engineers Did It on an Airplane. Here’s What That Means for Your Business.
Here's the number: 100,000. That's how many GitHub stars a repository called claw-code collected in roughly 24 hours. Not a year. Not a month. One day. By the time a live stream was done discussing it, the counter was climbing by a thousand stars every ten minutes. Nobody in the room could remember seeing anything grow that fast. Because nothing had. I watched it happen in real time. I'd met the two engineers behind it the weekend before at an AI hackathon in San Francisco. Within 72 hours of shaking hands, they'd built the fastest-growing repo in GitHub history —...