×
Meta allegedly trained its AI models on pirated torrent content
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Meta‘s use of pirated content from torrent sites to train its Llama large language model has been revealed through court documents, leading to copyright litigation.

Key developments: Court documents in the “Kadrey et al. v. Meta Platforms” case have exposed internal communications suggesting Meta’s use of unauthorized content for AI training.

  • Novelists Richard Kadrey and Christopher Golden filed the lawsuit in 2023, alleging Meta used their copyrighted works without permission
  • Judge Vince Chhabria ordered the release of unredacted documents that were previously hidden from public view
  • Internal communications show Meta employees expressing concerns about downloading torrented content on corporate laptops
  • Evidence indicates CEO Mark Zuckerberg may have authorized the use of pirated materials

Sources of unauthorized content: Meta allegedly accessed pirated materials from multiple unauthorized digital libraries to train its AI systems.

  • LibGen, a Russia-based digital library established in 2008, was identified as one primary source of pirated content
  • The platform contains unauthorized copies of books, magazines, and academic articles
  • Additional “shadow libraries” were also reportedly used in the training process
  • LibGen has faced multiple copyright lawsuits since its creation, though its operators remain unknown

Meta’s legal defense: The company has presented arguments attempting to justify its use of copyrighted materials.

  • Meta claims its use of public materials falls under “fair use” doctrine
  • The company argues it is using text solely for statistical language modeling and generating original content
  • Fair use determinations are typically made on a case-by-case basis in U.S. copyright law

Industry context: Similar allegations have emerged against other tech companies, though some have taken different approaches to AI training.

  • Apple faced scrutiny over its OpenELM model using YouTube video subtitles
  • Apple clarified that OpenELM was purely for research and not used in consumer products
  • Apple Intelligence uses licensed data and public content collected through web crawlers
  • Major publishers like The New York Times and The Atlantic have opted out of sharing content with Apple Intelligence

Looking ahead: This lawsuit could set important precedents for AI training practices and copyright law, potentially forcing tech companies to establish more transparent and legally compliant data sourcing methods for AI development.

Meta accused of training its AI using pirated content from torrents

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.