back

Inside Scaled Cognition’s APT-1 AI Agent building platform

With benchmark-leading performance, $21M in funding from Khosla Ventures, and a novel approach to AI agent development, this Berkeley professor-led startup might have cracked the code for practical enterprise AI

Get SIGNAL/NOISE in your inbox daily

While awaiting hands-on access to Scaled Cognition’s platform, our research reveals what may be one of this year’s most significant enterprise AI developments. Led by UC Berkeley AI professor and CTO Dan Klein, this startup backs its bold claims with impressive benchmark results and an efficient development approach.

Their newly announced APT-1 system leads major agentic benchmarks, including Tau-Bench and ComplexFuncBench. These benchmarks test an AI’s ability to handle complex API sequences and comply with business policies—crucial capabilities for real-world enterprise applications. Most remarkably, a US-based team achieved this for under $11 million, a fraction of typical AI development costs.

In an industry driven by funding headlines, Scaled Cognition’s backing is telling. Khosla Ventures led their $21 million seed round in 2023, with Vinod Khosla joining the board. In the often-hyped AI startup world, the involvement of one of Silicon Valley’s most discerning investors signals strong technological potential.

Klein’s platform introduction emphasizes practical AI implementation: “It’s focused on actions not tokens so it can obey your business logic better and it’s a specialist, fast and compact.” This statement reveals their distinctive approach. While competitors chase larger language models and better token prediction, Scaled Cognition pursues business utility.

The technical architecture of APT-1 breaks from conventional AI approaches through three innovations: optimization for actions rather than tokens, focusing on business operations instead of language prediction; a fully synthetic agentic data pipeline requiring no human-labeled data; and a revolutionary reinforcement learning approach using agent-to-agent self-play, similar to techniques that mastered Chess and Go.

Through their Agent Builder platform and GenAPI technology, companies can build, test, and deploy specialized AI agents within an hour—without integrating with real APIs during development. This dramatically reduces implementation risk and complexity. The platform functions as a safe “flight simulator” for AI systems, letting businesses validate implementations before touching real customer data or transactions.

Their synthetic training data approach solves a persistent AI development challenge. Instead of using web-scraped or enterprise data, which often lack connections between conversations and actions, they’ve built a data pipeline that generates precisely the grounded data needed for agent training. This eliminates a major bottleneck: the scarcity of high-quality training data combining conversational elements with associated actions.

The business implications are significant. Financial services companies could create loan-processing AI agents that maintain strict compliance. Healthcare providers could deploy agents managing appointments and follow-up care within HIPAA guidelines. Retail businesses could implement AI for complex returns while following company policies—all with reduced development time and risk.

Their capital efficiency is remarkable. While AI development typically requires hundreds of millions in investment, their benchmark-leading performance with just $11 million suggests a fundamentally more efficient approach.

Their self-play reinforcement learning system marks another advance. Though proven in games with clear win/loss conditions, Scaled Cognition has adapted it for business applications, using simulated agent-to-agent interactions to teach systems proper action execution while respecting policies. This could transform how businesses automate complex processes while maintaining compliance.

For developers, the platform promises significant advances in AI implementation. Immediate code example interpretation would be groundbreaking. Testing implementations without touching production systems could substantially reduce development time and risk.

As we await hands-on testing, we’re keen to see how APT-1 handles real-world edge cases and complex business logic. Key questions remain: Will synthetic training data translate to real-world scenarios? How will agent-to-agent self-play learning apply to complex business processes?

For business leaders monitoring the AI space, Scaled Cognition’s approach offers a promising direction. If successful, their platform could fundamentally change how businesses adopt AI—making it more practical, less risky, and better aligned with business needs.

We’ll provide a detailed hands-on review upon accessing the platform. Meanwhile, with benchmark-leading performance, innovative technology, and strong financial backing, Scaled Cognition stands out in the crowded AI landscape.

Recent Blog Posts

Aug 13, 2025

ChatGPT 5 – When Your AI Friend Gets a Corporate Makeover

I've been using OpenAI's models since the playground days, back when you had to know what you were doing just to get them running. This was before ChatGPT became a household name, when most people had never heard of a "large language model." Those early experiments felt like glimpsing the future. So when OpenAI suddenly removed eight models from user accounts last week, including GPT-4o, it hit different than it would for someone who just started using ChatGPT last month. This wasn't just a product change. It felt like losing an old friend. The thing about AI right now is...

May 22, 2025

Anthropic Claude 4 release

As a fan and daily user of Anthropic's Claude, we're excited about their latest release proclaiming Claude 4 "the world's best coding model" with "sustained performance on long-running tasks that require focused effort and thousands of steps." Yet we're also fatigued by the AI industry's relentless pace. The Hacker News comment section reveals something fascinating: we're experiencing collective AI development fatigue. The release that would have blown minds a year ago is now met with a mix of excitement and exhaustion—a perfect snapshot of where we are in the AI hype cycle. Code w/ Claude VideoCode with Claude Conference Highlights...

May 22, 2025

How Sam Altman just executed the tech industry’s most audacious talent heist

When Jony Ive walked away from Apple in 2019, Silicon Valley held its breath. The man who designed the iPhone—the device that redefined human interaction with technology—was free to work with anyone. Google's billions beckoned. Meta's metaverse promised new frontiers. Microsoft's enterprise muscle offered guaranteed scale. Instead, Ive chose a startup CEO barely into his thirties, betting his next chapter on artificial intelligence hardware that didn't yet exist. That CEO was Sam Altman. And with Tuesday's announcement that Ive's design firm LoveFrom is merging with OpenAI, Altman has pulled off what may be the most strategically devastating talent acquisition in...