back
Get SIGNAL/NOISE in your inbox daily

Apple, Nvidia, and Anthropic have used subtitles from over 170,000 YouTube videos to train their AI models without the creators’ knowledge or consent, raising concerns about the ethics and legality of such practices in the rapidly advancing field of artificial intelligence.

Key takeaways: An investigation by Proof News found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by major tech companies to train AI models:

  • The dataset, called YouTube Subtitles, was part of a larger compilation called the Pile, released by the nonprofit EleutherAI with the stated goal of lowering barriers to AI development.
  • Apple, Nvidia, Anthropic, Salesforce, Bloomberg, and Databricks have all confirmed using the Pile to train their AI models, despite YouTube’s terms of service prohibiting the unauthorized scraping of content.
  • The dataset includes videos from educational channels, major media outlets, YouTube megastars, and even some promoting conspiracy theories, without the creators’ awareness or permission.

Creators react with frustration: Many YouTube creators expressed frustration and concern upon learning that their content was used to train AI without their consent:

  • David Pakman, host of The David Pakman Show, argued that if AI companies profit from his work, he should be compensated, especially since the technology could potentially put him out of work.
  • Dave Wiskus, CEO of the creator-owned streaming service Nebula, called the unauthorized use of creators’ work “theft” and “disrespectful,” warning that it could be used to exploit and harm artists.
  • The producers of popular educational channels Crash Course and SciShow were “frustrated to learn that [their] thoughtfully produced educational content has been used in this way without [their] consent.”

Implications for the creative industry: The unauthorized use of YouTube videos to train AI raises broader questions about the future of the creative industry and the need for regulation:

  • Many creators worry that AI could eventually generate content similar to what they make or even produce outright copycats, threatening their livelihoods.
  • The lack of consent and compensation for the use of creators’ work has led to lawsuits against AI companies, with the question of fair use versus copyright infringement remaining unresolved in the courts.
  • Some argue that technology companies have “run roughshod” over creators’ rights and that there is a need for greater regulation to protect their interests as AI continues to advance.

Analyzing deeper: While the use of publicly available datasets like YouTube Subtitles may seem like a convenient way for AI companies to train their models, it raises serious ethical and legal questions about the rights of content creators in the age of artificial intelligence. The lack of transparency and consent in these practices has left many creators feeling exploited and uncertain about their future, as they face the prospect of AI-generated content displacing their own work.

As the legal battles over the unauthorized use of creative works to train AI play out in the courts, it is clear that there is a need for greater regulation and oversight to ensure that the benefits of these powerful new technologies are distributed fairly and that the rights of creators are protected. Without such safeguards, we risk creating a future in which the creative industries are dominated by a handful of tech giants, with individual artists and creators left behind.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...