back
Get SIGNAL/NOISE in your inbox daily

AI privacy innovation: Mostly AI has launched a synthetic text functionality that generates privacy-protected data for training enterprise AI models, addressing concerns about using real customer data containing personally identifiable information.

  • The new tool automates the process of creating synthetic data while preserving the patterns of original datasets, allowing businesses to leverage customer insights without risking privacy.
  • Synthetic data can also be used to rebalance datasets, remove bias, and generate mock data for software testing.

How the technology works: Mostly AI’s platform allows companies to upload proprietary datasets and fine-tune generators to create privacy-protected, synthesized versions of their data.

  • Users can upload data from local devices or external sources and select from various language models, including pre-trained options from HuggingFace.
  • The resulting synthetic data preserves original statistical patterns while complying with privacy protection regulations such as GDPR and CCPA.
  • Mostly AI claims its synthetic text can deliver performance improvements of up to 35% compared to text generated by prompting GPT-4o-mini with few or no real-world examples.

Industry context and potential impact: The launch of Mostly AI’s synthetic text functionality comes as more companies invest in generative AI for specific use cases and products, increasing the importance of proprietary data for training large language models.

  • Unlike public models like ChatGPT, which are trained on vast amounts of scraped internet data, enterprise AI often requires specialized training on a business’s customer data.
  • Synthetic data offers a solution to the privacy risks associated with using real customer information containing personally identifiable information.
  • The technology also has the potential to address the growing concern that AI models are exhausting public data sources and yielding diminishing returns.

Challenges and considerations: While synthetic data shows promise in addressing privacy concerns and expanding AI training capabilities, its implementation is not without challenges.

  • A Gartner report from April noted that synthetic data has unrealized potential in software engineering but must be deployed carefully.
  • Creating synthetic data can be resource-intensive, requiring specific testing stages for each use case.
  • There are concerns about model collapse, the idea that models may deteriorate after ingesting too much synthetic data. However, Mostly AI claims to avoid this issue by generating synthetic data once and applying it directly to downstream tasks.

Industry perspectives: The launch of Mostly AI’s synthetic text functionality has sparked discussions about the future of AI training and data privacy.

  • Mostly AI CEO Tobias Hann argues that leveraging both structured and unstructured synthetic data is crucial for safely training and deploying future generative AI solutions.
  • The company positions its technology as a solution to the perceived plateau in AI training due to the exhaustion of public data sources.
  • Even major players like Meta have used a combination of human and synthetic data to train advanced models like Llama 3.1 405B.

Looking ahead: As the AI industry continues to grapple with data privacy concerns and the need for high-quality training data, synthetic data generation may play an increasingly important role.

  • The effectiveness and widespread adoption of synthetic data in AI training remain to be seen, particularly in light of concerns about potential model collapse on a larger scale.
  • As more enterprises explore the use of synthetic data, its impact on AI development, privacy protection, and model performance will likely become clearer in the coming years.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...