CO/AI Subscribe
Wednesday · June 17, 2026 · Issue No. 899
Video

AI Engineering with the Google Gemini 2.5 Model Family – Philipp Schmid, Google DeepMind

Watch on YouTube

Gemini 2.5: breaking AI engineering barriers

Google's Gemini 2.5 marks a significant leap forward in how developers can build with multimodal AI models. In his presentation, Philipp Schmid from Google DeepMind unveils how Gemini 2.5's architecture eliminates previous constraints around context windows and input processing, offering a new paradigm for AI engineering that combines unprecedented flexibility with simplified development approaches.

The video delves into Google's latest Gemini model family, emphasizing how these advances are transforming how developers build AI applications. Schmid, clearly enthusiastic about these developments, walks through the architectural improvements that address persistent challenges in working with large language models while showcasing practical applications that demonstrate genuine capability leaps rather than incremental improvements.

  • Gemini 2.5 introduces a "windowless" architecture that effectively eliminates traditional context window constraints, allowing processing of inputs up to 2 million tokens without degradation in performance
  • The model family features true multimodality, handling text, images, audio, and video with equal proficiency through a unified architecture, rather than treating different input types as separate processes
  • Google has simplified the developer experience with consistent APIs across all model sizes (Flash, Pro, Ultra), enabling easier scaling and deployment while maintaining strong alignment between model capabilities and outputs

The end of context windows changes everything

The most revolutionary aspect of Gemini 2.5 is how it fundamentally rethinks the concept of context windows. This isn't just a technical improvement—it represents a paradigm shift in how AI systems process information. Traditional LLMs have always been constrained by fixed context windows, forcing developers to implement complex chunking strategies and retrieval augmentation techniques. Gemini 2.5's architecture effectively eliminates this limitation.

This matters tremendously because it removes what has been perhaps the most significant engineering bottleneck in building practical AI applications. When systems can process massive amounts of information at once—like entire codebases, lengthy legal documents, or comprehensive medical histories—without information loss at window boundaries, applications can become dramatically more capable while requiring less engineering overhead. The demonstrations showing performance consistency across 10K, 1M and even 2M tokens suggest that the common practice of retrieval-augmented generation (RAG) might become unnecessary for many use cases,

Share: X LinkedIn Email
Video Feed

More videos

All videos →
Claude Fable 5: When Capability Meets Economics
Video

Claude Fable 5: When Capability Meets Economics

Anthropic released Cloud Fable 5 with a paradox built in: safeguards sophisticated enough to let a mythosclass model...

Run Agentic AI Entirely on Your Mac—No Cloud, No Latency, No Privacy Tradeoffs
Video

Run Agentic AI Entirely on Your Mac—No Cloud, No Latency, No Privacy Tradeoffs

Apple’s MLX framework is mature enough now that you can run serious agentic AI workflows locally on Silicon...

Hermes Agent Master Class
Video

Hermes Agent Master Class

Welcome to the Hermes Agent Master Class — an 11-episode series taking you from zero to fully leveraging...

CONSULTING

Outsider
Labs.

A management consulting team focused on AI transformations for executives and business owners.

Work with us →