×
YouTube Creators Outraged To Find Major AI Companies Used Subtitles to Train Models
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Apple, Nvidia, and Anthropic have used subtitles from over 170,000 YouTube videos to train their AI models without the creators’ knowledge or consent, raising concerns about the ethics and legality of such practices in the rapidly advancing field of artificial intelligence.

Key takeaways: An investigation by Proof News found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by major tech companies to train AI models:

  • The dataset, called YouTube Subtitles, was part of a larger compilation called the Pile, released by the nonprofit EleutherAI with the stated goal of lowering barriers to AI development.
  • Apple, Nvidia, Anthropic, Salesforce, Bloomberg, and Databricks have all confirmed using the Pile to train their AI models, despite YouTube’s terms of service prohibiting the unauthorized scraping of content.
  • The dataset includes videos from educational channels, major media outlets, YouTube megastars, and even some promoting conspiracy theories, without the creators’ awareness or permission.

Creators react with frustration: Many YouTube creators expressed frustration and concern upon learning that their content was used to train AI without their consent:

  • David Pakman, host of The David Pakman Show, argued that if AI companies profit from his work, he should be compensated, especially since the technology could potentially put him out of work.
  • Dave Wiskus, CEO of the creator-owned streaming service Nebula, called the unauthorized use of creators’ work “theft” and “disrespectful,” warning that it could be used to exploit and harm artists.
  • The producers of popular educational channels Crash Course and SciShow were “frustrated to learn that [their] thoughtfully produced educational content has been used in this way without [their] consent.”

Implications for the creative industry: The unauthorized use of YouTube videos to train AI raises broader questions about the future of the creative industry and the need for regulation:

  • Many creators worry that AI could eventually generate content similar to what they make or even produce outright copycats, threatening their livelihoods.
  • The lack of consent and compensation for the use of creators’ work has led to lawsuits against AI companies, with the question of fair use versus copyright infringement remaining unresolved in the courts.
  • Some argue that technology companies have “run roughshod” over creators’ rights and that there is a need for greater regulation to protect their interests as AI continues to advance.

Analyzing deeper: While the use of publicly available datasets like YouTube Subtitles may seem like a convenient way for AI companies to train their models, it raises serious ethical and legal questions about the rights of content creators in the age of artificial intelligence. The lack of transparency and consent in these practices has left many creators feeling exploited and uncertain about their future, as they face the prospect of AI-generated content displacing their own work.

As the legal battles over the unauthorized use of creative works to train AI play out in the courts, it is clear that there is a need for greater regulation and oversight to ensure that the benefits of these powerful new technologies are distributed fairly and that the rights of creators are protected. Without such safeguards, we risk creating a future in which the creative industries are dominated by a handful of tech giants, with individual artists and creators left behind.

Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI

Recent News

How to balance bold, responsible and successful AI deployment

Major companies are establishing AI governance structures and training programs while racing to deploy generative AI for competitive advantage.

AI chatbots match humans in 50% of consciousness traits

Recent studies reveal AI language models exhibit half of known human consciousness markers, while notably lacking physical awareness traits.

What does the future hold for WebGPU?

The new web standard enables AI models and advanced graphics to run directly in browsers, marking a shift from traditional desktop-based computing.