×
Wikipedia faces AI bot bandwidth crisis as scraping costs threaten site stability
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Wikipedia is experiencing a bandwidth crisis due to AI bot activity, with automated scraping operations dramatically increasing infrastructure costs and threatening site stability. This situation highlights the growing tension between open knowledge resources and AI companies’ data-gathering practices, raising important questions about sustainability and responsible access to publicly available information in the era of large AI models.

The big picture: Wikipedia’s infrastructure is buckling under unprecedented traffic from AI bots scraping content, with the nonprofit Wikimedia Foundation warning that automated requests have “grown exponentially.”

  • The foundation revealed that since January 2024, bandwidth used for downloading multimedia content has surged by 50%, primarily from automated programs harvesting openly licensed images for AI model training.
  • While human traffic spikes during high-interest events are manageable, the scale of bot scraping presents growing risks to site stability and significantly increases data center costs.

By the numbers: Bot traffic is disproportionately consuming Wikipedia’s resources compared to human visitors.

  • At least 65% of resource-consuming traffic comes from bots, despite bots representing only about 35% of total pageviews.
  • The scraping activity often targets less popular articles and even hits developer infrastructure, including code review platforms and bug trackers.

Steps being taken: The Wikimedia Foundation is implementing both immediate and long-term measures to address the unsustainable situation.

  • Site managers have already imposed case-by-case rate limiting for problematic AI crawlers, with some bots facing outright bans.
  • The foundation is developing a “Responsible Use of Infrastructure” plan that will likely require bot operators to authenticate for high-volume scraping and API use.

What they’re saying: The Wikimedia Foundation emphasized the financial reality of maintaining its services amid the AI scraping surge.

  • “Our content is free, our infrastructure is not: We need to act now to re-establish a healthy balance,” the foundation stated.

Looking ahead: Wikipedia is seeking community feedback on methods to identify AI bot traffic and filter access appropriately, indicating a shift toward more controlled access for automated systems.

Wikipedia Faces Flood of AI Bots That Are Eating Bandwidth, Raising Costs

Recent News

AI courses from Google, Microsoft and more boost skills and résumés for free

As AI becomes critical to business decision-making, professionals can enhance their marketability with free courses teaching essential concepts and applications without requiring technical backgrounds.

Veo 3 brings audio to AI video and tackles the Will Smith Test

Google's latest AI video generation model introduces synchronized audio capabilities, though still struggles with realistic eating sounds when depicting the celebrity in its now-standard benchmark test.

How subtle biases derail LLM evaluations

Study finds language models exhibit pervasive positional preferences and prompt sensitivity when making judgments, raising concerns for their reliability in high-stakes decision-making contexts.