×
AWS’ new prompt caching feature cuts AI costs by 90%
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI infrastructure cost management and optimization are becoming increasingly important as enterprise adoption grows, prompting major cloud providers to introduce new features aimed at reducing expenses.

Latest AWS Bedrock features: Amazon Web Services has unveiled two key capabilities – Intelligent Prompt Routing and Prompt Caching – to help customers reduce AI model usage costs.

  • Intelligent Prompt Routing automatically directs queries to appropriately-sized models within a chosen model family, potentially reducing costs by up to 30% without sacrificing accuracy
  • The system ensures simple queries are handled by smaller models while complex questions are routed to more sophisticated ones
  • AWS customer Argo Labs demonstrates this by using smaller models for basic yes/no questions and larger models for nuanced inquiries about menu options

Prompt Caching implementation: AWS has introduced caching capabilities that store commonly used prompts to avoid unnecessary model calls.

  • The feature can reduce costs by up to 90% and latency by up to 85% for supported models
  • This addition brings AWS in line with competitors like Anthropic and OpenAI, who already offer prompt caching through their APIs
  • The system works by storing and reusing responses for repeated prompts rather than generating new tokens each time

Cost considerations: The expense of running AI applications remains a significant barrier to widespread enterprise adoption.

  • Beyond model training costs, operational expenses for regular model usage can be substantial
  • The introduction of agentic use cases adds another layer of cost complexity due to frequent model interactions
  • Industry leaders like OpenAI have suggested that AI costs may decrease as adoption increases and technology matures

Ecosystem expansion: AWS continues to broaden its model marketplace on Bedrock with new partnerships and offerings.

  • Recent additions include models from Poolside, Stability AI’s Stable Diffusion 3.5, and Luma’s Ray 2
  • Luma has chosen AWS as its first cloud provider partner, utilizing Amazon’s SageMaker HyperPod for model development
  • The collaboration between AWS and Luma demonstrates the platform’s commitment to supporting AI innovation through close technical partnerships

Future implications: The introduction of cost-optimization features by major cloud providers signals a shift toward making AI more economically viable for widespread enterprise deployment, though questions remain about how quickly costs will decrease and whether these optimizations will be enough to drive broader adoption.

AWS now allows prompt caching with 90% cost reduction

Recent News

Veo 2 vs. Sora: A closer look at Google and OpenAI’s latest AI video tools

Tech companies unveil AI tools capable of generating realistic short videos from text prompts, though length and quality limitations persist as major hurdles.

7 essential ways to use ChatGPT’s new mobile search feature

OpenAI's mobile search upgrade enables business users to access current market data and news through conversational queries, marking a departure from traditional search methods.

FastVideo is an open-source framework that accelerates video diffusion models

New optimization techniques reduce the computing power needed for AI video generation from days to hours, though widespread adoption remains limited by hardware costs.