OpenAI expands developer offerings with real-time voice API: The company’s annual developer day introduced several new features, with the centerpiece being a real-time application programming interface (API) for voice interactions, albeit at a premium price point.
Real-time voice capabilities and pricing structure: OpenAI’s new API enables developers to create applications with fluid, real-time conversations between users and language models.
- The real-time API is based on the GPT-4o large language model, which costs $2.50 per million input tokens and $10 per million output tokens for text-only interactions.
- For real-time voice applications, the pricing is at least double, with input and output tokens costing $5 and $20 per million tokens, respectively.
- Voice tokens come at an even higher premium: $100 per million audio input tokens and $200 per million audio output tokens.
- OpenAI estimates that this pricing translates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output for standard voice conversations.
Potential applications and cost-saving measures: The company showcased various use cases for real-time voice interactions while also introducing methods to reduce costs for developers.
- Example applications include automated health coaches and language tutors that can engage in real-time conversations with users.
- To help offset the higher costs, OpenAI introduced prompt caching, which reuses tokens from previously submitted inputs, cutting the price of GPT-4o input text tokens in half.
LLM distillation and fine-tuning enhancements: OpenAI also unveiled new tools to help developers create more efficient and specialized models.
- The LLM distillation service allows developers to use data from larger models to train smaller ones, streamlining a previously complex process.
- Developers can now fine-tune models with image data, enabling more specific applications in various domains.
- Food delivery service Grab demonstrated the practical applications of image fine-tuning, improving their mapping operations for delivery routes.
Pricing for new services: OpenAI provided detailed pricing information for its new offerings, maintaining a premium pricing structure.
- Image fine-tuning is priced at $3.75 per million input tokens and $15 per million output tokens, matching standard fine-tuning rates.
- Training image models comes at a higher cost of $25 per million tokens.
Broader implications for AI development: OpenAI’s new features represent significant advancements in AI accessibility and customization for developers, but the premium pricing may impact widespread adoption.
- The introduction of real-time voice capabilities could lead to more natural and engaging AI interactions across various industries.
- However, the high costs associated with these new features may limit their use to larger companies or well-funded projects, potentially creating a divide in AI application development.
- The emphasis on fine-tuning and distillation services suggests a trend towards more specialized and efficient AI models, which could lead to a wider range of targeted AI applications in the future.
OpenAI lets developers build real-time voice apps - at a substantial premium