back
Get SIGNAL/NOISE in your inbox daily

NVIDIA has released Granary, an open-source multilingual speech dataset containing approximately one million hours of audio, alongside two new AI models designed for transcription and translation across 25 European languages. The release addresses a critical gap in speech AI development, as only a tiny fraction of the world’s 7,000 languages are currently supported by AI language models, with particular focus on underrepresented European languages like Croatian, Estonian, and Maltese.

What you should know: The Granary dataset represents a massive leap forward in multilingual speech AI training data, providing developers with ready-to-use resources for production-scale applications.

  • The dataset includes nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation.
  • It covers nearly all of the European Union’s 24 official languages, plus Russian and Ukrainian.
  • The dataset was developed through collaboration between NVIDIA’s speech AI team, Carnegie Mellon University researchers, and Fondazione Bruno Kessler, an Italian research institute.

The big picture: NVIDIA’s approach tackles data scarcity through an innovative processing pipeline that transforms unlabeled audio into structured, high-quality training data without requiring resource-intensive human annotation.

  • The team demonstrated that it takes around half as much Granary training data to achieve target accuracy levels compared to other popular datasets.
  • The processing pipeline is available open source on GitHub through the NVIDIA NeMo Speech Data Processor toolkit.
  • This methodology can be adapted by developers for other automatic speech recognition (ASR) or automatic speech translation (AST) models.

In plain English: Traditional speech AI development requires humans to manually label hours of audio recordings—a time-consuming and expensive process. NVIDIA’s new approach uses AI to automatically process raw audio files and turn them into clean, structured training data that other AI models can learn from, much like having a smart assistant organize messy files into neat folders.

Key models released: NVIDIA introduced two complementary AI models trained on the Granary dataset, each optimized for different use cases.

  • NVIDIA Canary-1b-v2: A billion-parameter model optimized for high-quality transcription and translation between English and two dozen supported languages, offering comparable quality to models 3x larger while running inference up to 10x faster.
  • NVIDIA Parakeet-tdt-0.6b-v3: A streamlined 600-million-parameter model designed for real-time transcription, capable of processing 24-minute audio segments in a single inference pass with automatic language detection.

How it works: Both models leverage NVIDIA’s NeMo software suite to deliver enhanced functionality for enterprise applications.

  • NeMo Curator filtered out synthetic examples from source data to ensure only high-quality samples were used for training.
  • Both models provide accurate punctuation, capitalization, and word-level timestamps in their outputs.
  • The models support multilingual chatbots, customer service voice agents, and near-real-time translation services.

Why this matters: The release democratizes access to high-quality multilingual speech AI technology, particularly benefiting languages with limited available training data.

  • European languages underrepresented in human-annotated datasets now have access to critical resources for developing more inclusive speech technologies.
  • The permissive licensing of Canary-1b-v2 enables widespread adoption and customization by developers.
  • The methodology can accelerate speech AI innovation by providing a replicable framework for other languages and applications.

What’s next: The research behind Granary will be presented at Interspeech, a language processing conference taking place in the Netherlands from August 17-21, with all resources now available on Hugging Face for immediate developer access.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...