OpenAI wins data scraping lawsuit against Raw Story in NY court

AI copyright lawsuit dismissed: A federal court in New York has dismissed a copyright infringement lawsuit against OpenAI, brought by alternative news outlets Raw Story and AlterNet.

The plaintiffs alleged that OpenAI violated copyright laws by using their articles to train ChatGPT and other AI models without preserving copyright management information (CMI).
The case centered on Section 1202(b) of the Digital Millennium Copyright Act (DMCA), which protects CMI such as author names and titles.
Judge Colleen McMahon granted OpenAI’s motion to dismiss, citing lack of standing as the plaintiffs couldn’t demonstrate concrete injury from OpenAI’s actions.

Key legal considerations: The court’s decision highlights the challenges in applying traditional copyright law to generative AI technologies.

The judge noted that updates to large language model (LLM) interfaces complicate attribution and traceability, making it less likely for content to be reproduced verbatim.
The ruling aligns with similar cases where courts have struggled to apply copyright law to AI-generated content, such as the Doe 1 v. GitHub case involving Microsoft’s Copilot.
There is currently no firm consensus on how Section 1202(b) applies to a wide range of online content, with some courts imposing an “identicality” requirement while others allow for more flexible interpretations.

Implications for AI and content creators: The dismissal of this lawsuit could set a precedent for how courts handle similar copyright claims in the evolving landscape of generative AI.

The ruling suggests that plaintiffs may face challenges in court without clear, demonstrable harm or exact reproduction of their work.
It raises questions about how content creators can ensure proper credit and prevent unauthorized use of their work in AI training datasets.
Licensing agreements between AI companies and publishers, like those struck by OpenAI with Condé Nast, may become a new standard for legally using copyrighted content while compensating creators.

Broader context: The case highlights the ongoing debate surrounding AI companies’ use of scraped web content for training purposes.

While AI model providers often guard their training datasets, the industry has likely scraped large portions of the web to train various models.
This practice has led some creators to view data scraping as AI’s “original sin,” raising concerns about copyright infringement and fair compensation.
The ruling in this case could potentially influence similar lawsuits, such as the one filed by The New York Times against OpenAI and Microsoft.

Legal challenges in AI copyright cases: Courts are grappling with how to apply existing copyright laws to generative AI technologies.

Recent rulings suggest a reluctance to extend Section 1202(b) protections unless plaintiffs can demonstrate specific, tangible harm.
The synthesizing nature of AI-generated content, as opposed to direct replication, makes it difficult to prove copyright violations under current laws.
Plaintiffs face an uphill battle in proving harm, with courts signaling that vague claims are insufficient and hard evidence of damage is required.

Future outlook: The dismissal of this lawsuit may shape the future of AI copyright litigation and industry practices.

While the odds seem favorable for AI companies in such cases, the threat of lawsuits remains a concern.
Transparency, proper data records, and compliance with copyright laws will be essential for AI developers and tech companies to avoid legal trouble.
Judge McMahon noted that the case could be refiled with additional explanation, but significant obstacles remain for the plaintiffs.

Balancing innovation and rights: As AI technology continues to advance, finding a balance between fostering innovation and protecting content creators’ rights remains a complex challenge for both the legal system and the tech industry.

OpenAI wins data scraping lawsuit against Raw Story in NY court

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development