back
Get SIGNAL/NOISE in your inbox daily

Anthropic’s ClaudeBot crawler hits iFixit’s website almost a million times in 24 hours, ignoring the company’s anti-scraping policies. This raises questions about AI companies’ data scraping practices and the limited options available for websites to protect their content.

Key details of the incident: iFixit CEO Kyle Wiens revealed that Anthropic’s ClaudeBot web crawler accessed the website’s servers nearly a million times within a 24-hour period, seemingly violating iFixit’s Terms of Use:

  • iFixit’s Terms of Use explicitly prohibit reproducing, copying, or distributing any content from the website without express written permission, including using the content for training machine learning or AI models.
  • When questioned by Wiens, Anthropic’s AI assistant Claude acknowledged that iFixit’s content was off-limits according to the website’s terms.

Anthropic’s response and web crawling practices: Anthropic’s stance on web scraping and the options available for website owners to opt out of data collection have come under scrutiny:

  • In response to the incident, Anthropic referred to an FAQ page stating that its crawler can only be blocked using a robots.txt file extension.
  • iFixit has since added the crawl-delay extension to its robots.txt file to prevent further scraping by ClaudeBot.

Broader implications for website owners and AI companies: The incident highlights the ongoing challenges and debates surrounding AI companies’ data scraping practices and the limited options available for website owners to protect their content:

  • Other websites, such as Read the Docs, Freelancer.com, and the Linux Mint web forum, have also reported aggressive scraping by Anthropic’s crawler, with some experiencing site outages due to the increased strain.
  • Many AI companies, including OpenAI, rely on the robots.txt file as the primary method for website owners to opt out of data scraping, but this approach offers limited flexibility in specifying what scraping is and isn’t permitted.
  • Some AI companies, like Perplexity, have been known to ignore robots.txt exclusions entirely, further complicating the issue for website owners seeking to protect their content.

The need for a more comprehensive approach: As AI companies continue to rely on web-scraped data for training their models, the incident underscores the need for a more balanced and comprehensive approach to data collection that respects website owners’ rights and preferences:

  • The current opt-out methods, such as robots.txt files, provide limited granularity and control for website owners, leaving them vulnerable to aggressive or unwanted scraping.
  • AI companies must work towards more transparent and collaborative data collection practices that prioritize obtaining proper permissions and adhering to websites’ terms of use.
  • Regulators and industry stakeholders should explore the development of standardized protocols and guidelines for responsible web scraping in the context of AI training, ensuring a fair balance between the needs of AI companies and the rights of website owners.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...