×
Mistral launches new AI moderation API for content filtering
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

New content moderation API from Mistral AI: Mistral AI has launched a moderation service to detect undesirable text content across various policy dimensions, aiming to enhance AI safety and protect downstream deployments.

  • The API, which powers the moderation service in Le Chat, is now available for users to tailor to their specific applications and safety standards.
  • This release responds to the growing industry interest in LLM-based moderation systems that can improve scalability and robustness across applications.

Key features of the moderation model: The content moderation classifier employs an LLM trained to categorize text inputs into nine distinct categories, offering a comprehensive approach to content filtering.

  • Two endpoints are available: one for raw text and another for conversational content, with the latter focusing on classifying the last message within a conversational context.
  • The model is natively multilingual, trained on 11 languages including Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

Policy categories and pragmatic approach: Mistral AI’s moderation system introduces a practical method for LLM safety by addressing model-generated harms and covering relevant policy categories.

  • The classifier includes categories for unqualified advice and personally identifiable information (PII), addressing concerns specific to AI-generated content.
  • A full list of policies is provided in the technical documentation, allowing users to understand the scope of the moderation system.

Performance metrics: Mistral AI has shared Area Under the Curve – Precision Recall (AUC PR) scores across policies on their internal test set, demonstrating the model’s effectiveness.

  • While specific scores are not provided in this summary, users can refer to the technical documentation for detailed performance metrics.
  • The company is actively working with customers to refine and improve the moderation tooling based on real-world applications.

Broader implications and future developments: Mistral AI’s release of this moderation API signifies a step towards more transparent and customizable AI safety measures in the industry.

  • The company’s engagement with the research community suggests ongoing efforts to contribute to safety advancements in the broader AI field.
  • As AI applications continue to proliferate, the availability of such moderation tools may become increasingly crucial for responsible AI deployment and use across various sectors.
Introducing the Mistral Moderation API

Recent News

6 places where Google’s Gemini AI should be but isn’t

Despite impressive expansion, Gemini misses crucial opportunities where users need AI assistance most.

How to protect your portfolio from a potential AI bubble burst

Even AI champions like Altman and Zuckerberg are whispering about bubble risks.