Creators are trying to roofie AI bots to prevent crawling and unauthorized training

Web-browsing bots now account for the majority of internet traffic for the first time, with AI company crawlers like ChatGPT-User and ClaudeBot representing 6% and 13% of all web traffic respectively. Content creators are fighting back with “AI poisoning” tools that corrupt training data, but these same techniques could be weaponized to spread misinformation at scale.

The big picture: The battle between AI companies scraping data and content creators protecting their work has escalated beyond legal disputes into a technological arms race that could reshape how information flows across the internet.

Key details: Major AI companies argue data scraping falls under fair use doctrine and is essential for model development.

OpenAI previously stated it would be “impossible” to develop AI models without using copyrighted work.
Disney and Universal sued Midjourney, an AI image generator, in June, alleging it plagiarizes characters from franchises like Star Wars and Despicable Me.
When the US Copyright Office analyzed public comments about AI and copyright, 91% contained negative sentiments about AI.

How the poisoning works: Researchers have developed tools that make imperceptible changes to content that confuse AI models without affecting human perception.

Glaze applies “style cloaks” that cause AI to misinterpret artistic styles, making watercolor paintings appear as oil paintings to AI systems.
Nightshade goes further by poisoning data to create false associations, potentially making AI models link “cat” with images of dogs.
Both tools, developed at the University of Chicago, have been downloaded over 10 million times.

Corporate countermeasures: Tech companies are deploying their own defensive strategies against data scraping.

Cloudflare’s AI Labyrinth creates mazes of nonsense AI-generated pages to waste bot resources and trap the 50 billion daily AI crawler requests it encounters.
The company also released a tool requiring AI companies to pay for website access or face blocking.
Testing shows Glaze remains 85% effective against countermeasures, suggesting AI companies may find dealing with poisoned data more trouble than it’s worth.

The darker implications: Nation-states may be exploiting similar poisoning techniques to manipulate AI-generated information.

The Atlantic Council, a US-based think tank, alleges Russia’s Pravda network posted millions of fake news pages designed to trick AI crawlers into promoting Kremlin narratives about Ukraine.
Analysis by NewsGuard, a technology firm that tracks misinformation, found that 10 major AI chatbots outputted text aligned with Pravda’s views in one-third of cases.
The volume of poisoned content could lead AI systems to over-emphasize certain narratives when responding to users.

What they’re saying: Experts see both the potential and peril of these defensive tools.

“These are trillion-dollar market-cap companies, literally the biggest companies in the world, taking by force what they want,” said Ben Zhao, the University of Chicago researcher behind Glaze and Nightshade.
Jacob Hoffman-Andrews at the Electronic Frontier Foundation, a digital rights nonprofit, noted these tools offer “a neat method of action that doesn’t rely on changing regulations, which can take a while.”
“At the root of all this is money,” Zhao added, pointing to AI companies’ reluctance to pay licensing fees for legitimate content.

Why this matters: The proliferation of AI poisoning tools represents a fundamental shift in how content creators can protect their work, but it also opens the door for malicious actors to corrupt the information ecosystem that powers AI systems worldwide.

Creators are trying to roofie AI bots to prevent crawling and unauthorized training

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development