back
Get SIGNAL/NOISE in your inbox daily

AI data collection faces pushback: Apple’s introduction of Applebot-Extended, a tool allowing websites to opt out of AI training data collection, has sparked a significant response from major online platforms and publishers.

  • Prominent sites including Facebook, Instagram, Craigslist, Tumblr, and several leading news outlets have chosen to block Applebot-Extended, signaling a shift in attitudes towards web crawlers and their role in AI development.
  • Website owners can prevent Applebot-Extended from accessing their content by updating their robots.txt file, a standard method for controlling web crawler access.
  • Recent analyses indicate that 6-7% of high-traffic websites are blocking Applebot-Extended, with news and media outlets leading the charge.

News industry takes a stand: The media sector appears to be at the forefront of the pushback against AI data collection, with a quarter of news websites blocking Applebot-Extended.

  • Comparatively, 53% of news sites block OpenAI’s bot, and 43% block Google-Extended, suggesting a broader trend of resistance to AI training data collection in the news industry.
  • Major publishers like The New York Times, The Financial Times, The Atlantic, Vox Media, USA Today network, and Condé Nast have all opted to block Apple’s AI crawler.
  • This widespread action reflects growing concerns among publishers about the use of their content for AI training without explicit permission or compensation.

Strategic considerations: The decision to block AI crawlers is not solely about protecting content; it also serves as a strategic move for many publishers.

  • Some media organizations are using this as leverage to potentially negotiate partnerships or licensing deals with AI companies.
  • Different publishers are adopting varied approaches, with some implementing blanket bans on all AI bots without partnerships, while others make case-by-case decisions.
  • The New York Times has taken a strong stance, arguing that scraping content for commercial purposes should require explicit permission, not just an opt-out option.

Broader implications for AI and publishing: The widespread adoption of AI bot blocking tools highlights the evolving relationship between content creators and AI companies.

  • The use of robots.txt to control AI data access has become a crucial issue for media executives and publishers, reflecting the growing importance of AI in the digital landscape.
  • This trend could potentially impact the quality and diversity of data available for AI training, possibly affecting the development of future AI models and applications.
  • The situation underscores the need for clearer guidelines and potentially new legal frameworks governing the use of online content for AI training purposes.

Industry-wide impact: The response to Applebot-Extended is indicative of a larger shift in how web content is viewed and valued in the age of AI.

  • As AI technologies become more advanced and integrated into various products and services, the data used to train these systems is increasingly seen as a valuable resource.
  • This development may lead to more formal arrangements between content creators and AI companies, potentially reshaping the economics of online publishing and AI development.
  • The actions of major publishers could inspire smaller websites and content creators to also take steps to protect their data from AI scraping.

Looking ahead: Balancing innovation and rights: The current situation highlights the complex balance between fostering AI innovation and protecting the rights and interests of content creators.

  • As AI continues to advance, finding a mutually beneficial arrangement between AI companies and content creators will be crucial for sustainable development in both sectors.
  • This may lead to new business models, such as licensing agreements for AI training data, or the development of more sophisticated content protection technologies.
  • The outcome of this ongoing debate could have far-reaching consequences for the future of AI development, online publishing, and digital content creation as a whole.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...