CO/AI Subscribe
Friday · June 19, 2026 · Issue No. 900
Video

AI Dev 25 | Aman Khan: Beyond Vibe Checks—Rethinking How We Evaluate AI Agent Performance

Watch on YouTube

A great demo is just the starting point—getting AI agents to perform reliably in production is the real challenge. In his AI Dev 25 talk, Aman Kahn, Director of Product at Arize, shared how his team moved beyond simple accuracy checks to build more robust evaluation frameworks for generative AI systems. Drawing from real-world experience, he outlined how to: – Use LLMs as judges for nuanced evaluation – Build automated pipelines to catch issues early – Establish feedback loops and workflows that support rapid iteration without compromising quality Whether you’re just getting started with agents or scaling them in production, this session offers practical techniques for evaluating and improving agent performance.

Share: X LinkedIn Email
Video Feed

More videos

All videos →
Claude Fable 5: When Capability Meets Economics
Video

Claude Fable 5: When Capability Meets Economics

Anthropic released Cloud Fable 5 with a paradox built in: safeguards sophisticated enough to let a mythosclass model...

Run Agentic AI Entirely on Your Mac—No Cloud, No Latency, No Privacy Tradeoffs
Video

Run Agentic AI Entirely on Your Mac—No Cloud, No Latency, No Privacy Tradeoffs

Apple’s MLX framework is mature enough now that you can run serious agentic AI workflows locally on Silicon...

Hermes Agent Master Class
Video

Hermes Agent Master Class

Welcome to the Hermes Agent Master Class — an 11-episode series taking you from zero to fully leveraging...

CONSULTING

Outsider
Labs.

A management consulting team focused on AI transformations for executives and business owners.

Work with us →