Data - CO/AI

News/Data

Jul 3, 2024

Meta Faces AI Training Ban in Brazil Over Data Misuse Concerns

Meta, the social media giant, has been ordered by Brazilian regulators to stop training its AI models on personal data from Brazilian users, marking a significant setback for the company's AI ambitions in one of its largest markets. Key developments: The Brazilian data protection authority (ANPD) has banned Meta from using public Facebook, Messenger, and Instagram data from Brazil for AI training purposes: The decision follows Meta's privacy policy update in May, which granted the company permission to use Brazilian users' posts, images, and captions for training its AI models. ANPD cited "risks of serious damage and difficulty to users"...

read Jun 26, 2024

Tech Giants Quietly Claim Rights to Train AI on Your Data

The New York Times recently highlighted how major tech companies have quietly updated their terms of service over the past year to enable using customer data to train AI models: Key changes in terms of service: Tech giants like Google, Microsoft, and others have modified their terms to allow for using customer data in developing AI systems: Google's terms now state that the company can use content "to develop, test, improve, and maintain our products and services, including for the development and training of artificial intelligence models and systems." Microsoft made similar changes, adding language about using customer content "to...

read Jun 26, 2024

Reddit Defies AI Bots, Igniting Battle Over Internet Data Control

Upending the web's long-standing data-sharing model, Reddit takes a stand against AI bots scraping its content, signaling a larger battle over who controls and profits from the internet's data. Key Takeaways: Reddit is escalating its fight against unauthorized AI bots by updating its robots.txt file, a core web component that dictates how web crawlers can access a site: The move aims to block most automated bots from accessing Reddit's public data without a licensing agreement, a policy that has technically been in place but is now being actively enforced. Reddit's chief legal officer, Ben Lee, emphasizes that the change sends...

read Jun 24, 2024

AI Data Quality: The Key to Effective, Reliable, and Ethical AI Systems

In the age of artificial intelligence, data quality is crucial for building effective and reliable AI systems. Poor quality data used to train AI models, such as Reddit's content partnership with Google that led to bizarre search results like recommending glue on pizza, highlights the importance of high-quality data in AI development. Defining high-quality data: Data quality is not just about accuracy or quantity, but rather data that is fit for its intended purpose and evaluated based on specific use cases: Relevance is critical, as the data must be directly applicable and meaningful to the problem the AI model aims...

read