Anthropic's Web Crawler Is Apparently Ignoring Companies' Anti-Scraping Policies

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Anthropic’s ClaudeBot crawler hits iFixit’s website almost a million times in 24 hours, ignoring the company’s anti-scraping policies. This raises questions about AI companies’ data scraping practices and the limited options available for websites to protect their content.

Key details of the incident: iFixit CEO Kyle Wiens revealed that Anthropic’s ClaudeBot web crawler accessed the website’s servers nearly a million times within a 24-hour period, seemingly violating iFixit’s Terms of Use:

iFixit’s Terms of Use explicitly prohibit reproducing, copying, or distributing any content from the website without express written permission, including using the content for training machine learning or AI models.
When questioned by Wiens, Anthropic’s AI assistant Claude acknowledged that iFixit’s content was off-limits according to the website’s terms.

Anthropic’s response and web crawling practices: Anthropic’s stance on web scraping and the options available for website owners to opt out of data collection have come under scrutiny:

In response to the incident, Anthropic referred to an FAQ page stating that its crawler can only be blocked using a robots.txt file extension.
iFixit has since added the crawl-delay extension to its robots.txt file to prevent further scraping by ClaudeBot.

Broader implications for website owners and AI companies: The incident highlights the ongoing challenges and debates surrounding AI companies’ data scraping practices and the limited options available for website owners to protect their content:

Other websites, such as Read the Docs, Freelancer.com, and the Linux Mint web forum, have also reported aggressive scraping by Anthropic’s crawler, with some experiencing site outages due to the increased strain.
Many AI companies, including OpenAI, rely on the robots.txt file as the primary method for website owners to opt out of data scraping, but this approach offers limited flexibility in specifying what scraping is and isn’t permitted.
Some AI companies, like Perplexity, have been known to ignore robots.txt exclusions entirely, further complicating the issue for website owners seeking to protect their content.

The need for a more comprehensive approach: As AI companies continue to rely on web-scraped data for training their models, the incident underscores the need for a more balanced and comprehensive approach to data collection that respects website owners’ rights and preferences:

The current opt-out methods, such as robots.txt files, provide limited granularity and control for website owners, leaving them vulnerable to aggressive or unwanted scraping.
AI companies must work towards more transparent and collaborative data collection practices that prioritize obtaining proper permissions and adhering to websites’ terms of use.
Regulators and industry stakeholders should explore the development of standardized protocols and guidelines for responsible web scraping in the context of AI training, ensuring a fair balance between the needs of AI companies and the rights of website owners.

Anthropic’s crawler is ignoring websites’ anti-AI scraping policies

The Verge

Menu

Anthropic’s Web Crawler Is Apparently Ignoring Companies’ Anti-Scraping Policies

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Anthropic’s Web Crawler Is Apparently Ignoring Companies’ Anti-Scraping Policies

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution