Forget chat. AI that can hear, see and click is already here

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

AI’s Evolution Beyond Text: The landscape of artificial intelligence is rapidly expanding beyond traditional text-based chatbots, with new multimodal models capable of processing and generating content across various formats including audio, video, and images.

Google’s NotebookLM, originally launched as a research tool, has gained viral popularity with its AI podcasting feature called Audio Overview.
Users can create AI-generated podcasts on various topics, including personal profiles and content summaries.
The quality of multimodal generative content has improved significantly in a short period, as evidenced by the advancement from Meta’s Make-A-Video to its new Movie Gen tool.

Shifting Interaction Paradigms: The way users engage with AI systems is becoming more intuitive and less reliant on text-based inputs.

OpenAI’s Canvas interface allows for collaborative project work with ChatGPT, moving away from traditional chat windows.
Users can now edit specific portions of text or code, streamlining the content generation process.
Google has introduced voice and video search capabilities, enabling users to ask questions about visual content in real-time.

Unexpected Success Stories: The rapid development of AI features has led to surprising hits among users, highlighting the unpredictable nature of innovation in this field.

NotebookLM’s Audio Overview feature became popular despite being a secondary feature within a larger product.
This mirrors the unexpected success of ChatGPT, which was not initially anticipated to be a breakout product for OpenAI.

Industry Pressure and Innovation: The multibillion-dollar generative AI boom has accelerated the pace of development, but a definitive “killer app” remains elusive.

AI companies are under immense pressure to monetize their technologies and deliver tangible results.
This pressure has led to a strategy of releasing various AI tools to gauge user reception and identify successful applications.

Quality Improvements: Significant investments in AI have contributed to rapid advancements in the quality of generated content across different modalities.

The progression from Meta’s Make-A-Video to Movie Gen demonstrates the swift improvement in video generation capabilities.
These advancements enable more realistic and diverse content creation options for users.

User Experience and Customization: New AI interfaces are focusing on providing more interactive and personalized experiences.

Tools like Google’s Lens app combined with AI allow for real-time video analysis and information retrieval.
The trend towards customizable interfaces reflects a shift towards making AI tools more accessible and user-friendly.

Implications for Content Creation: The rise of multimodal AI tools is reshaping the landscape of content creation and consumption.

AI-generated podcasts and videos offer new avenues for content production and distribution.
These tools have the potential to democratize content creation, allowing individuals to produce professional-quality material with minimal resources.

Broader Context: While AI capabilities continue to expand, the industry is still in a phase of experimentation and discovery.

The unexpected popularity of certain features underscores the difficulty in predicting which AI applications will resonate with users.
As AI tools become more integrated into daily life, their impact on various industries and social dynamics remains an open question.

Looking Ahead: The rapid evolution of AI capabilities suggests a future where interaction with technology becomes increasingly multimodal and intuitive.

The development of more sophisticated AI models capable of processing and generating diverse types of content may lead to new paradigms in human-computer interaction.
As these technologies continue to advance, questions about their societal impact, ethical use, and potential regulations are likely to become more prominent in public discourse.

Forget chat. AI that can hear, see and click is already here

MIT Technology Review

Menu

Forget chat. AI that can hear, see and click is already here

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Forget chat. AI that can hear, see and click is already here

Recent News

Condos with filters? Real estate agents use AI to fake property photos, sparking legal concerns

“Learn to AI”: California propels workforce training with tech giants across public education system

Qualcomm plans AI server chips for 2028 amid competitive challenges

Join the revolution

CO/AI

Resources

Join the revolution