×
Meta secretly trained its AI models on a Russian ‘shadow library,’ court docs show
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Meta’s use of pirated books database LibGen to train its AI language models has been revealed through court-ordered document unredaction, marking a significant development in an ongoing copyright lawsuit filed by authors.

The core revelation: Meta accessed and utilized Library Genesis (LibGen), a controversial pirated content database, for AI model training, despite internal concerns about the legality and optics of this approach.

  • Internal company discussions about using LibGen data were escalated to CEO Mark Zuckerberg
  • Meta employees expressed hesitation about accessing LibGen data from corporate laptops
  • The company’s AI team ultimately received approval to use the pirated materials

Legal context and implications: The case, Kadrey et al. v. Meta Platforms, represents a pivotal moment in determining how tech companies can legally utilize creative works for AI training.

  • Authors Richard Kadrey, Christopher Golden, and comedian Sarah Silverman filed the lawsuit in July 2023
  • Meta previously acknowledged using the Books3 dataset but had not disclosed its direct use of LibGen
  • The company maintains its actions fall under “fair use” doctrine and disputes the plaintiffs’ claims
  • LibGen itself faces ongoing legal challenges, including a recent $30 million judgment in 2024

Court developments: Judge Vince Chhabria’s ruling against Meta’s redaction attempts highlights growing judicial scrutiny of AI companies’ transparency.

  • The judge described Meta’s redaction approach as “preposterous” and aimed at avoiding negative publicity
  • Meta was ordered to file unredacted versions of key documents
  • The court warned Meta against making further broad redaction requests
  • Plaintiffs argue Meta not only used copyrighted material without permission but also participated in its distribution through torrenting

Questions of precedent: The unfolding legal battle between content creators and tech companies raises fundamental questions about AI training practices and intellectual property rights.

  • The case could establish important precedents for how AI companies can legally access and use training data
  • The outcome may influence future AI development practices and relationships between tech companies and content creators
  • Meta’s use of pirated materials suggests potential challenges in legally sourcing comprehensive training data for AI models
Meta Secretly Trained Its AI on a Notorious Russian 'Shadow Library,' Newly Unredacted Court Docs Reveal

Recent News

North Korea unveils AI-equipped suicide drones amid deepening Russia ties

North Korea's AI-equipped suicide drones reflect growing technological cooperation with Russia, potentially destabilizing security in an already tense Korean peninsula.

Rookie mistake: Police recruit fired for using ChatGPT on academy essay finds second chance

A promising police career was derailed then revived after an officer's use of AI revealed gaps in how law enforcement is adapting to new technology.

Auburn University launches AI-focused cybersecurity center to counter emerging threats

Auburn's new center brings together experts from multiple disciplines to develop defensive strategies against the rising tide of AI-powered cyber threats affecting 78 percent of security officers surveyed.