back

Inside Scaled Cognition’s APT-1 AI Agent building platform

With benchmark-leading performance, $21M in funding from Khosla Ventures, and a novel approach to AI agent development, this Berkeley professor-led startup might have cracked the code for practical enterprise AI

While awaiting hands-on access to Scaled Cognition’s platform, our research reveals what may be one of this year’s most significant enterprise AI developments. Led by UC Berkeley AI professor and CTO Dan Klein, this startup backs its bold claims with impressive benchmark results and an efficient development approach.

Their newly announced APT-1 system leads major agentic benchmarks, including Tau-Bench and ComplexFuncBench. These benchmarks test an AI’s ability to handle complex API sequences and comply with business policies—crucial capabilities for real-world enterprise applications. Most remarkably, a US-based team achieved this for under $11 million, a fraction of typical AI development costs.

In an industry driven by funding headlines, Scaled Cognition’s backing is telling. Khosla Ventures led their $21 million seed round in 2023, with Vinod Khosla joining the board. In the often-hyped AI startup world, the involvement of one of Silicon Valley’s most discerning investors signals strong technological potential.

Klein’s platform introduction emphasizes practical AI implementation: “It’s focused on actions not tokens so it can obey your business logic better and it’s a specialist, fast and compact.” This statement reveals their distinctive approach. While competitors chase larger language models and better token prediction, Scaled Cognition pursues business utility.

The technical architecture of APT-1 breaks from conventional AI approaches through three innovations: optimization for actions rather than tokens, focusing on business operations instead of language prediction; a fully synthetic agentic data pipeline requiring no human-labeled data; and a revolutionary reinforcement learning approach using agent-to-agent self-play, similar to techniques that mastered Chess and Go.

Through their Agent Builder platform and GenAPI technology, companies can build, test, and deploy specialized AI agents within an hour—without integrating with real APIs during development. This dramatically reduces implementation risk and complexity. The platform functions as a safe “flight simulator” for AI systems, letting businesses validate implementations before touching real customer data or transactions.

Their synthetic training data approach solves a persistent AI development challenge. Instead of using web-scraped or enterprise data, which often lack connections between conversations and actions, they’ve built a data pipeline that generates precisely the grounded data needed for agent training. This eliminates a major bottleneck: the scarcity of high-quality training data combining conversational elements with associated actions.

The business implications are significant. Financial services companies could create loan-processing AI agents that maintain strict compliance. Healthcare providers could deploy agents managing appointments and follow-up care within HIPAA guidelines. Retail businesses could implement AI for complex returns while following company policies—all with reduced development time and risk.

Their capital efficiency is remarkable. While AI development typically requires hundreds of millions in investment, their benchmark-leading performance with just $11 million suggests a fundamentally more efficient approach.

Their self-play reinforcement learning system marks another advance. Though proven in games with clear win/loss conditions, Scaled Cognition has adapted it for business applications, using simulated agent-to-agent interactions to teach systems proper action execution while respecting policies. This could transform how businesses automate complex processes while maintaining compliance.

For developers, the platform promises significant advances in AI implementation. Immediate code example interpretation would be groundbreaking. Testing implementations without touching production systems could substantially reduce development time and risk.

As we await hands-on testing, we’re keen to see how APT-1 handles real-world edge cases and complex business logic. Key questions remain: Will synthetic training data translate to real-world scenarios? How will agent-to-agent self-play learning apply to complex business processes?

For business leaders monitoring the AI space, Scaled Cognition’s approach offers a promising direction. If successful, their platform could fundamentally change how businesses adopt AI—making it more practical, less risky, and better aligned with business needs.

We’ll provide a detailed hands-on review upon accessing the platform. Meanwhile, with benchmark-leading performance, innovative technology, and strong financial backing, Scaled Cognition stands out in the crowded AI landscape.

Recent Blog Posts

May 14, 2026

The Livestream That Made 543,000 People Realize We’re Cooked

I was one of the 543,000 people that watched robots work a warehouse shift on a live stream and nobody was celebrating. That's the thing nobody talks about when they imagine the future. They talk about the economics. The efficiency gains. The disruption. What they don't talk about is how eerie it would feel to actually watch it happen in real time. On May 8th, 2025, Figure AI livestreamed humanoid robots—Helix-02 units—doing a full 8-hour shift in a warehouse. Fully autonomous. No human intervention. No puppeteers. No prerecorded segments. A live production run being broadcast with a timestamp and viewer...

May 13, 2026

Apple’s Real Move and Why They Win The AI Race

I've been an Apple user since the Apple II. I remember the rainbow cable. I was in the line for the early all-in-one Macintosh. I've built software for the Mac and iOS for decades. I own a Vision Pro. I'm not a casual observer. Which is why I can tell you what I think is actually happening at Apple right now has almost nothing to do with what the tech press thinks. Tim Cook didn't step down. He stepped away from an argument he lost. On the surface, the succession reads clean: Cook becomes executive chairman. John Ternus, a hardware...

May 5, 2026

Diamond Hands Are Bidding On Pez Dispensers. The Husks Are About To Run.

So here's what happened over the weekend. Ryan Cohen — the activist who turned GameStop from a dying mall retailer into the original meme stock, the patron saint of "to the moon" and "HODL" and the whole 2021 retail-revenge tableau — walked into The Wall Street Journal and announced an unsolicited $56 billion bid for eBay. Cash and stock. $125 a share. The bid is backed by GameStop's roughly 5% existing stake in eBay, $20 billion of debt-financing committed by TD Bank, $9 billion of cash on the GameStop balance sheet, and the residual halo of a stock that still...