back
Get SIGNAL/NOISE in your inbox daily

New research reveals that Meta‘s Llama 3.1 70B AI model has memorized nearly complete text from popular books including Harry Potter and the Philosopher’s Stone, The Great Gatsby, and 1984. This discovery could expose Meta to over $1 billion in statutory damages if courts rule against the company in ongoing copyright infringement cases, fundamentally shifting the legal landscape around AI training on copyrighted materials.

What you should know: Researchers tested 13 AI models to determine how much copyrighted book content they could reproduce verbatim, finding dramatic differences between companies.

  • Meta’s model demonstrated extensive memorization of entire books, while most other models from Google, DeepSeek, EleutherAI, and Microsoft showed minimal verbatim retention.
  • The testing method involved splitting book excerpts into prefix and suffix sections, then prompting models to complete the passages.
  • Researchers examined 36 copyrighted books, including works by authors currently suing Meta in the Kadrey v Meta Platforms case.

The big picture: This memorization finding challenges AI companies’ core legal defense that their models transform rather than replicate copyrighted works.

  • Tech companies have argued their models create “fresh combinations of words” based on training data, qualifying for fair use protections.
  • The discovery that at least one major model can reproduce books verbatim undermines claims that AI systems merely learn “general relationships between words.”
  • “AI models are not just ‘plagiarism machines’, as some have alleged, but it also means that they do more than just learn general relationships between words,” says Mark Lemley, a Stanford University law professor.

Why this matters: The financial stakes are enormous, with researchers estimating that copyright infringement on just 3% of the Books3 dataset could result in nearly $1 billion in damages.

  • The Books3 dataset contains nearly 200,000 copyrighted books, including many pirated copies, used to train popular AI models.
  • Legal precedent from this case could determine whether AI companies must seek permission before training on copyrighted materials.
  • The memorization capability varies significantly between models and books, making it “very hard to set a clear legal rule that will work across all cases.”

Legal implications differ by jurisdiction: The memorization findings carry different weight in US versus UK copyright law.

  • US courts will evaluate whether the memorization qualifies under “fair use” doctrine, which provides broader exceptions for copyrighted material use.
  • UK copyright law follows “fair dealing” concept with much narrower exceptions, making AI models that memorized pirated books unlikely to qualify for protection.
  • Robert Lands at Howard Kennedy, a London law firm, notes the finding could be “very significant from a copyright perspective” in the UK.

What they’re saying: Legal experts emphasize this creates a powerful forensic tool while highlighting fundamental questions about AI training rights.

  • “The question is, did they have the right to do it?” asks Randy McCarthy at Hall Estill, an Oklahoma law firm, noting that companies typically acknowledge training on copyrighted materials.
  • Meta spokesperson Emil Vazquez maintains that “fair use of copyrighted materials is vital” for developing AI models, stating “We disagree with Plaintiffs’ assertions, and the full record tells a different story.”
  • Mark Lemley, who previously defended Meta but dropped them as a client in January 2025, says he still believes the company should win the case.

Key details: The research methodology provides a replicable framework for testing AI memorization across different models and content types.

  • Testing involved comparing AI completion probabilities against random chance to determine genuine memorization.
  • The technique could serve as a “good forensic tool” for identifying the extent of AI memorization in future legal proceedings.
  • Multiple ongoing lawsuits in US and UK courts will determine whether this memorization constitutes copyright infringement.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...