Whisper is an automatic speech recognition (ASR) system developed by OpenAI that approaches human-level robustness and accuracy in English speech recognition. It leverages a large and diverse dataset of 680,000 hours of multilingual and multitask supervised data collected from the web to achieve strong performance.

Key features:
- Multilingual support for transcribing speech in multiple languages and translating into English
- Robustness to diverse accents and background noise for improved real-world accuracy
- Recognition of technical terms and jargon for enhanced performance in specialized domains
- Strong zero-shot performance across various datasets without fine-tuning
- Open-sourced models and inference code for easy developer integration

How it works:
Whisper uses an end-to-end approach implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into log-Mel spectrograms, passed into an encoder, and a decoder predicts the corresponding text caption. The decoder is trained to perform tasks like language identification, phrase-level timestamps, multilingual speech transcription, and speech translation to English.

As an open-source system, Whisper can be integrated by developers into various applications. No specific product integrations are mentioned.

Use of AI:
Whisper leverages generative AI through its Transformer-based architecture to learn complex patterns in speech data and generate accurate transcriptions.

AI foundation model:
Whisper is built on a custom Transformer-based architecture, which is a type of large language model (LLM) that can effectively process sequential speech data.

How to access:
The Whisper models and inference code are open-sourced and publicly available, allowing developers and researchers to use the system in their own speech recognition and natural language processing applications. Whisper was launched on September 21, 2022 by OpenAI, an artificial intelligence research company founded in 2015.

