×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI inference gets a boost on Google Cloud Run: Google Cloud has enhanced its serverless platform, Cloud Run, by adding support for NVIDIA L4 GPUs, significantly improving its capabilities for handling complex AI workloads.

Key features of the upgrade:

  • The integration of NVIDIA L4 GPUs extends Cloud Run’s capabilities to support real-time AI inference
  • This enhancement is particularly beneficial for deploying lightweight generative AI models and small language models
  • The NVIDIA L4 GPU offers up to 120 times the video performance compared to CPUs and 2.7 times the performance for generative AI tasks compared to previous GPU generations

Deployment process and flexibility:

  • Developers can create container images with necessary dependencies, including NVIDIA GPU drivers and AI models
  • The platform supports various Google Cloud services, including Google Kubernetes Engine and Google Compute Engine
  • This flexibility allows developers to choose their preferred level of abstraction for building and deploying AI-enabled applications

Expanded capabilities beyond AI inference:

  • The enhanced Cloud Run with NVIDIA L4 GPUs also enables other compute-intensive tasks
  • These tasks include on-demand image recognition, video transcoding, streaming, and 3D rendering

NVIDIA-Google Cloud partnership:

  • The partnership aims to provide advanced AI capabilities across various layers of the AI stack
  • It includes the provision of Google Cloud A3 VMs powered by NVIDIA H100 GPUs
  • NVIDIA DGX Cloud, a software and AI supercomputing solution, is available to customers directly through web browsers
  • NVIDIA AI Enterprise is available on Google Cloud Marketplace, providing a secure, cloud-native platform for enterprise-ready AI applications

Real-world applications:

  • L’Oréal is using this technology to power its real-time AI inference applications
  • Writer, an AI writing platform, has seen substantial improvements in model inference performance while reducing hosting costs by 15%

Impact on cloud-based AI development:

  • The addition of NVIDIA L4 GPU support to Google Cloud Run represents a major milestone in serverless AI inference
  • By combining Cloud Run’s ease of use and scalability with NVIDIA GPUs’ powerful performance, Google Cloud is providing developers and businesses with essential tools for building, deploying, and scaling AI applications

Looking ahead: As AI continues to evolve and become more integral to business operations, the integration of powerful GPU capabilities into serverless platforms like Cloud Run is likely to accelerate the development and deployment of AI-powered applications across various industries.

Google Brings Serverless Inference To Cloud Run Based On Nvidia GPU

Recent News

New AI Tools Can Now Predict Severe RSV Cases in Children

New machine learning models aim to predict which children are most at risk for severe RSV infections, potentially improving prevention and treatment strategies.

How to Use Pixel Studio to Generate AI Images on the Google Pixel 9

Google's Pixel 9 introduces AI-powered image creation through the Pixel Studio app, enabling users to generate custom visuals from text prompts and edit existing photos.

AI’s Insatiable Need for Energy is Presenting Big Investment Opportunities

The rapid expansion of AI-driven data centers is straining US power infrastructure, requiring over $500 billion in investments and potentially consuming 12% of national electricity by 2030.