AWS' new prompt caching feature cuts AI costs by 90%

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

AI infrastructure cost management and optimization are becoming increasingly important as enterprise adoption grows, prompting major cloud providers to introduce new features aimed at reducing expenses.

Latest AWS Bedrock features: Amazon Web Services has unveiled two key capabilities – Intelligent Prompt Routing and Prompt Caching – to help customers reduce AI model usage costs.

Intelligent Prompt Routing automatically directs queries to appropriately-sized models within a chosen model family, potentially reducing costs by up to 30% without sacrificing accuracy
The system ensures simple queries are handled by smaller models while complex questions are routed to more sophisticated ones
AWS customer Argo Labs demonstrates this by using smaller models for basic yes/no questions and larger models for nuanced inquiries about menu options

Prompt Caching implementation: AWS has introduced caching capabilities that store commonly used prompts to avoid unnecessary model calls.

The feature can reduce costs by up to 90% and latency by up to 85% for supported models
This addition brings AWS in line with competitors like Anthropic and OpenAI, who already offer prompt caching through their APIs
The system works by storing and reusing responses for repeated prompts rather than generating new tokens each time

Cost considerations: The expense of running AI applications remains a significant barrier to widespread enterprise adoption.

Beyond model training costs, operational expenses for regular model usage can be substantial
The introduction of agentic use cases adds another layer of cost complexity due to frequent model interactions
Industry leaders like OpenAI have suggested that AI costs may decrease as adoption increases and technology matures

Ecosystem expansion: AWS continues to broaden its model marketplace on Bedrock with new partnerships and offerings.

Recent additions include models from Poolside, Stability AI’s Stable Diffusion 3.5, and Luma’s Ray 2
Luma has chosen AWS as its first cloud provider partner, utilizing Amazon’s SageMaker HyperPod for model development
The collaboration between AWS and Luma demonstrates the platform’s commitment to supporting AI innovation through close technical partnerships

Future implications: The introduction of cost-optimization features by major cloud providers signals a shift toward making AI more economically viable for widespread enterprise deployment, though questions remain about how quickly costs will decrease and whether these optimizations will be enough to drive broader adoption.

AWS now allows prompt caching with 90% cost reduction

VentureBeat

Menu

AWS’ new prompt caching feature cuts AI costs by 90%

Recent News

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

How to protect your portfolio from a potential AI bubble burst

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

AWS’ new prompt caching feature cuts AI costs by 90%

Recent News

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

How to protect your portfolio from a potential AI bubble burst

Join the revolution

CO/AI

Resources

Join the revolution