Bluesky blocks AI training on user posts, but can it stop others' attempts?

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

The battle between social media platforms and AI data scrapers continues to escalate as Bluesky grapples with protecting user content from unauthorized AI training datasets.

Recent incident sparks privacy concerns: A significant breach occurred when a Hugging Face employee scraped and published one million Bluesky posts to the AI repository, highlighting the vulnerability of public social media data.

The dataset gained significant attention on the platform, trending throughout the day before being removed
The employee has since apologized for the unauthorized data collection and removed the scraped content
The incident exemplifies the ease with which public API data can be harvested for AI training purposes

Platform’s response and limitations: Bluesky is exploring mechanisms to allow users to control how their data is used for AI training, though the effectiveness of such measures remains uncertain.

The platform is developing features that would allow users to specify their consent preferences for AI training
Bluesky acknowledges that enforcement will ultimately depend on external developers’ willingness to respect these settings
The platform faces technical and practical challenges in preventing unauthorized data collection through its API

Broader implications for user privacy: This incident underscores the growing tension between open social platforms and the increasing demand for AI training data.

Social media platforms struggle to balance API accessibility with user privacy protection
The inability to fully control data usage after it becomes public presents an ongoing challenge for platforms and users alike
Technical solutions for preventing unauthorized data scraping may conflict with the open nature of social platforms

Looking ahead: While platforms like Bluesky can implement consent mechanisms, the fundamental challenge of enforcing these preferences in an open ecosystem remains unresolved, suggesting a need for industry-wide standards and technological solutions to protect user data from unauthorized AI training use.

Bluesky won’t use your posts for AI training, but can it stop anyone else?

The Verge

Menu

Bluesky blocks AI training on user posts, but can it stop others’ attempts?

Recent News

Iowa teachers prepare for AI workforce with Google partnership

Fatalist attraction: AI doomers go even harder, abandon planning as catastrophic predictions intensify

Microsoft brings AI-powered Copilot to NFL sidelines for real-time coaching

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Bluesky blocks AI training on user posts, but can it stop others’ attempts?

Recent News

Iowa teachers prepare for AI workforce with Google partnership

Fatalist attraction: AI doomers go even harder, abandon planning as catastrophic predictions intensify

Microsoft brings AI-powered Copilot to NFL sidelines for real-time coaching

Join the revolution

CO/AI

Resources

Join the revolution