A new conversational AI called Sesame is raising eyebrows with its uncannily human-like speech patterns, complete with hesitations, self-corrections, and natural interruptions. Unlike traditional AI assistants that simply convert text to speech, Sesame’s breakthrough Conversational Speech Model (CSM) generates speech in a way that mirrors authentic human conversation, potentially marking a significant shift in how we interact with AI systems.
The big picture: Sesame represents a departure from conventional AI voice assistants by deliberately incorporating human imperfections rather than striving for polished perfection.
How it works: Sesame’s Conversational Speech Model combines text and audio processing into a single unified system, enabling more natural speech generation.
Key features: The AI demonstrates sophisticated conversational abilities that go beyond traditional voice assistants.
Why this matters: Sesame’s ability to replicate human speech imperfections so accurately raises important questions about the future of AI-human interactions and the increasing difficulty of distinguishing between human and AI voices.
Behind the numbers: While Sesame currently remains a niche technology, its development suggests a future where phone conversations may require verification of whether the speaker is human or AI.