New content moderation API from Mistral AI: Mistral AI has launched a moderation service to detect undesirable text content across various policy dimensions, aiming to enhance AI safety and protect downstream deployments.
- The API, which powers the moderation service in Le Chat, is now available for users to tailor to their specific applications and safety standards.
- This release responds to the growing industry interest in LLM-based moderation systems that can improve scalability and robustness across applications.
Key features of the moderation model: The content moderation classifier employs an LLM trained to categorize text inputs into nine distinct categories, offering a comprehensive approach to content filtering.
- Two endpoints are available: one for raw text and another for conversational content, with the latter focusing on classifying the last message within a conversational context.
- The model is natively multilingual, trained on 11 languages including Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Policy categories and pragmatic approach: Mistral AI’s moderation system introduces a practical method for LLM safety by addressing model-generated harms and covering relevant policy categories.
- The classifier includes categories for unqualified advice and personally identifiable information (PII), addressing concerns specific to AI-generated content.
- A full list of policies is provided in the technical documentation, allowing users to understand the scope of the moderation system.
Performance metrics: Mistral AI has shared Area Under the Curve – Precision Recall (AUC PR) scores across policies on their internal test set, demonstrating the model’s effectiveness.
- While specific scores are not provided in this summary, users can refer to the technical documentation for detailed performance metrics.
- The company is actively working with customers to refine and improve the moderation tooling based on real-world applications.
Broader implications and future developments: Mistral AI’s release of this moderation API signifies a step towards more transparent and customizable AI safety measures in the industry.
- The company’s engagement with the research community suggests ongoing efforts to contribute to safety advancements in the broader AI field.
- As AI applications continue to proliferate, the availability of such moderation tools may become increasingly crucial for responsible AI deployment and use across various sectors.
Introducing the Mistral Moderation API