×
Bluesky blocks AI training on user posts, but can it stop others’ attempts?
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The battle between social media platforms and AI data scrapers continues to escalate as Bluesky grapples with protecting user content from unauthorized AI training datasets.

Recent incident sparks privacy concerns: A significant breach occurred when a Hugging Face employee scraped and published one million Bluesky posts to the AI repository, highlighting the vulnerability of public social media data.

  • The dataset gained significant attention on the platform, trending throughout the day before being removed
  • The employee has since apologized for the unauthorized data collection and removed the scraped content
  • The incident exemplifies the ease with which public API data can be harvested for AI training purposes

Platform’s response and limitations: Bluesky is exploring mechanisms to allow users to control how their data is used for AI training, though the effectiveness of such measures remains uncertain.

  • The platform is developing features that would allow users to specify their consent preferences for AI training
  • Bluesky acknowledges that enforcement will ultimately depend on external developers’ willingness to respect these settings
  • The platform faces technical and practical challenges in preventing unauthorized data collection through its API

Broader implications for user privacy: This incident underscores the growing tension between open social platforms and the increasing demand for AI training data.

  • Social media platforms struggle to balance API accessibility with user privacy protection
  • The inability to fully control data usage after it becomes public presents an ongoing challenge for platforms and users alike
  • Technical solutions for preventing unauthorized data scraping may conflict with the open nature of social platforms

Looking ahead: While platforms like Bluesky can implement consent mechanisms, the fundamental challenge of enforcing these preferences in an open ecosystem remains unresolved, suggesting a need for industry-wide standards and technological solutions to protect user data from unauthorized AI training use.

Bluesky won’t use your posts for AI training, but can it stop anyone else?

Recent News

Ericsson showcases AI-powered telecom networks that cut energy use by 33% at MWC 2025

Major telecom carriers partner with Ericsson to deploy AI systems that autonomously manage networks and reduce their power consumption across multiple countries.

Acelab’s new AI platform for materials helps architects find the right fit

AI-powered database helps architects search through 10,000 materials while preserving their firms' institutional knowledge.

Democratic AI: The battle for freedom of intelligence in AI development

Policymakers and tech leaders debate how to embed democratic values into AI systems as global competition intensifies over who will shape the technology's future.