Google’s generative video model Veo 3 continues to add garbled, nonsensical subtitles to user-generated videos more than a month after launch, despite explicit user requests for no captions. The persistent issue is forcing users to spend additional money regenerating clips or use external tools to remove unwanted text, highlighting the challenges of correcting problems in major AI models once they’re deployed.
The big picture: Veo 3 represents Google’s latest attempt to compete in the generative video space, allowing users to create videos with sound and dialogue for the first time.
- Academy Award-nominated director Darren Aronofsky used the tool to create a short film called Ancestra.
- Demis Hassabis, Google DeepMind’s CEO, compared the advancement to “emerging from the silent era of video generation.”
- The model is available to paying subscribers starting at $249.99 per month through Google’s AI filmmaking tool Flow, Gemini, and other platforms.
The subtitle problem: Users report that up to 40% of dialogue-containing clips include unusable gibberish subtitles, even when prompts explicitly request no captions.
- Each Veo 3 generation costs a minimum of 20 AI credits, with additional credits available at $25 per 2,500 credits.
- Users must regenerate clips, use external subtitle-removal tools, or crop videos to eliminate unwanted text.
- Josh Woodward, vice president of Google Labs and Gemini, posted on X on June 9 that Google had developed fixes, but issues persist.
Why it’s happening: The problem likely stems from Veo 3’s training data, which probably includes YouTube videos, vlogs, gaming channels, and TikTok content that contain embedded subtitles.
- These subtitles are part of the video frames rather than separate text tracks, making them difficult to remove before training.
- “The text-to-video model is trained using reinforcement learning to produce content that mimics human-created videos, and if such videos include subtitles, the model may ‘learn’ that incorporating subtitles enhances similarity with human-generated content,” explains Shuo Niu, an assistant professor at Clark University.
The cost burden: Advertising creative director Mona Weiss says regenerating scenes to avoid gibberish subtitles is becoming expensive.
- “If you’re creating a scene with dialogue, up to 40% of its output has gibberish subtitles that make it unusable,” she says.
- “You’re burning through money trying to get a scene you like, but then you can’t even use it.”
- When Weiss sought a refund for wasted credits, Google offered to refund the cost of Veo 3 but not the credits—accepting would have meant losing access to the model.
Technical challenges: Fixing the subtitle issue would require extensive retraining of the model, according to experts.
- Negative prompts like “No subtitles” are typically less effective than positive ones in generative AI models.
- Google would need to check every frame of training videos and either remove or relabel those with captions before retraining—an endeavor that would take weeks, says Tuhin Chakrabarty, an assistant professor at Stony Brook University.
What they’re saying: Google acknowledges the ongoing issues but maintains they’re working on improvements.
- “We’re continuously working to improve video creation, especially with text, speech that sounds natural, and audio that syncs perfectly,” a Google spokesperson says.
- “We encourage users to try their prompt again if they notice an inconsistency and give us feedback using the thumbs up/down option.”
- Documentary maker Katerina Cizek believes the problem reflects Google’s rush to market: “Google needed a win. They needed to be the first to pump out a tool that generates lip-synched audio. And so that was more important than fixing their subtitle issue.”
Google’s generative video model Veo 3 has a subtitles problem