×
NVIDIA’s NeMo Retriever Boosts Accuracy and Throughput for Enterprise LLMs
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

NVIDIA introduces NeMo Retriever NIM microservices to enhance the accuracy and throughput of large language models (LLMs) for enterprises by leveraging their proprietary data.

Key components of NeMo Retriever: The microservices consist of embedding and reranking models that work together to efficiently retrieve the most relevant data for generating accurate responses:

  • Embedding models transform diverse data into numerical vectors, capturing meaning and nuance, and are computationally efficient.
  • Reranking models score the retrieved data based on its relevance to the query, improving accuracy but being more computationally complex.
  • NeMo Retriever combines both model types to ensure the most helpful and accurate results for enterprises.

Integration with industry platforms: NeMo Retriever microservices are being integrated into various data platforms to enhance their AI capabilities:

  • Cohesity’s Gaia AI product will use NeMo Retriever to power insightful, generative AI applications through retrieval-augmented generation (RAG).
  • NetApp is collaborating with NVIDIA to connect NeMo Retriever to its intelligent data infrastructure, allowing customers to access proprietary insights securely.
  • DataStax has integrated NeMo Retriever into its Astra DB and Hyper-Converged platforms, enabling faster time-to-market for RAG capabilities.

Versatile use cases: NeMo Retriever powers a wide range of AI applications across industries:

  • Building intelligent chatbots that provide accurate, context-aware responses.
  • Analyzing vast amounts of data to identify security vulnerabilities.
  • Extracting insights from complex supply chain information.
  • Enhancing AI-enabled retail shopping advisors for personalized experiences.

Compatibility and flexibility: NeMo Retriever microservices can be used in conjunction with other NVIDIA NIM microservices and integrated with various models:

  • Riva NIM microservices can be used alongside NeMo Retriever to supercharge speech AI applications.
  • The microservices can be integrated with community models, NVIDIA models, or users’ custom models.
  • They can be deployed in the cloud, on-premises, or in hybrid environments, providing flexibility for developers.

Broader implications: NVIDIA’s NeMo Retriever NIM microservices demonstrate the company’s continued efforts to democratize AI and empower enterprises with cutting-edge tools for building accurate, data-driven applications. By leveraging proprietary data and state-of-the-art models, NeMo Retriever has the potential to accelerate the adoption of generative AI across industries, enabling businesses to unlock valuable insights and create more intelligent, responsive solutions. However, as with any powerful AI technology, it will be crucial for enterprises to consider the ethical implications and ensure responsible deployment to maximize benefits while mitigating potential risks.

AI, Go Fetch! New NVIDIA NeMo Retriever Microservices Boost LLM Accuracy and Throughput

Recent News

Machine learning “periodic table” accelerates AI discovery

MIT's systematic classification reveals mathematical connections across AI methods, enabling researchers to identify gaps and develop new algorithms that outperform existing approaches.

Oregon lawmakers crack down on AI-generated fake nudes

Oregon joins other states in expanding "revenge porn" laws to criminalize AI-generated fake explicit imagery, after law enforcement faced cases where clothed social media photos were manipulated into realistic-looking nudes without legal recourse.

Docker simplifies AI model deployment with new container workflow

Docker's new tools standardize AI model deployment by extending container principles to machine learning workflows, enabling developers to manage AI components with the same consistency and security as traditional applications.