YouTube Creators Outraged To Find Major AI Companies Used Subtitles to Train Models

Apple, Nvidia, and Anthropic have used subtitles from over 170,000 YouTube videos to train their AI models without the creators’ knowledge or consent, raising concerns about the ethics and legality of such practices in the rapidly advancing field of artificial intelligence.

Key takeaways: An investigation by Proof News found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by major tech companies to train AI models:

The dataset, called YouTube Subtitles, was part of a larger compilation called the Pile, released by the nonprofit EleutherAI with the stated goal of lowering barriers to AI development.
Apple, Nvidia, Anthropic, Salesforce, Bloomberg, and Databricks have all confirmed using the Pile to train their AI models, despite YouTube’s terms of service prohibiting the unauthorized scraping of content.
The dataset includes videos from educational channels, major media outlets, YouTube megastars, and even some promoting conspiracy theories, without the creators’ awareness or permission.

Creators react with frustration: Many YouTube creators expressed frustration and concern upon learning that their content was used to train AI without their consent:

David Pakman, host of The David Pakman Show, argued that if AI companies profit from his work, he should be compensated, especially since the technology could potentially put him out of work.
Dave Wiskus, CEO of the creator-owned streaming service Nebula, called the unauthorized use of creators’ work “theft” and “disrespectful,” warning that it could be used to exploit and harm artists.
The producers of popular educational channels Crash Course and SciShow were “frustrated to learn that [their] thoughtfully produced educational content has been used in this way without [their] consent.”

Implications for the creative industry: The unauthorized use of YouTube videos to train AI raises broader questions about the future of the creative industry and the need for regulation:

Many creators worry that AI could eventually generate content similar to what they make or even produce outright copycats, threatening their livelihoods.
The lack of consent and compensation for the use of creators’ work has led to lawsuits against AI companies, with the question of fair use versus copyright infringement remaining unresolved in the courts.
Some argue that technology companies have “run roughshod” over creators’ rights and that there is a need for greater regulation to protect their interests as AI continues to advance.

Analyzing deeper: While the use of publicly available datasets like YouTube Subtitles may seem like a convenient way for AI companies to train their models, it raises serious ethical and legal questions about the rights of content creators in the age of artificial intelligence. The lack of transparency and consent in these practices has left many creators feeling exploited and uncertain about their future, as they face the prospect of AI-generated content displacing their own work.

As the legal battles over the unauthorized use of creative works to train AI play out in the courts, it is clear that there is a need for greater regulation and oversight to ensure that the benefits of these powerful new technologies are distributed fairly and that the rights of creators are protected. Without such safeguards, we risk creating a future in which the creative industries are dominated by a handful of tech giants, with individual artists and creators left behind.

YouTube Creators Outraged To Find Major AI Companies Used Subtitles to Train Models

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development