Cloudflare has accused Perplexity AI of acting like “North Korean hackers” after discovering the AI search company’s bots repeatedly circumventing anti-scraping measures to crawl websites without permission. This escalation in the ongoing battle over AI data collection could significantly undermine Perplexity’s ability to index content, as Cloudflare, an internet infrastructure provider, has now delisted the company as a “verified bot” and implemented hard blocks against its web crawlers.
What happened: Cloudflare CEO Matthew Prince publicly called out Perplexity AI on Monday for invasive web crawling practices that violate website protection measures.
- An investigation revealed Perplexity was “repeatedly modifying” its web-crawling bots to evade data-scraping blocks on third-party websites.
- The company used stealth tactics including impersonating Google Chrome on macOS browsers and rotating through multiple IP addresses outside its official range when blocked.
- Cloudflare verified these claims by creating test domains deliberately hidden from search engines, which Perplexity still managed to crawl.
The technical details: Perplexity’s sophisticated evasion methods operated at massive scale across the internet.
- “We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” Cloudflare found.
- The activity spanned “tens of thousands of domains and millions of requests per day,” according to the investigation.
- When stealth crawling was successfully blocked, Perplexity would fall back to other data sources, though these produced “less specific” answers that “lacked details from the original content.”
In plain English: Web crawlers are programs that automatically visit websites to collect information, much like a librarian systematically cataloging books. Websites can block these crawlers using a “robots.txt” file—essentially a “Do Not Enter” sign. Perplexity allegedly ignored these signs and disguised its crawlers to look like regular human users browsing with Chrome, making it nearly impossible for websites to identify and block them.
Why this matters: The crackdown highlights the growing tension between AI companies’ data hunger and website owners’ rights to control their content.
- Over 2.5 million websites have chosen to block AI training through Cloudflare’s managed robots.txt feature or AI crawler blocking rules.
- Media companies have already sued Perplexity and other AI providers like OpenAI for alleged copyright infringement over unauthorized content scraping.
- The incident underscores how some AI companies are willing to use deceptive practices to gather training data, despite growing legal and ethical concerns.
Competitive contrast: Cloudflare noted that other major AI companies are respecting anti-scraping measures.
- The company specifically mentioned that OpenAI has been complying with data scraping protections.
- This compliance difference could give law-abiding AI companies a competitive advantage as website owners increasingly implement blocking measures.
What’s next: Cloudflare expects an ongoing cat-and-mouse game as AI companies develop new evasion tactics.
- The company anticipates Perplexity will update its web crawler to circumvent the new blocking measures.
- Cloudflare’s delisting of Perplexity as a “verified bot” lumps its crawlers with other untrusted activity, potentially making content indexing significantly more difficult.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...