Google Lens expands search capabilities with video and voice features: Google has introduced new functionalities to its Lens app, allowing users to search using video and voice commands, enhancing the visual search experience.
- The update, rolling out in Search Labs on Android and iOS, enables users to record short videos and ask questions about what they’re seeing.
- Google’s Gemini AI model processes the video content and user queries to provide relevant responses and search results.
- The new feature builds upon Google’s existing image search capabilities, applying computer vision techniques to analyze multiple video frames in sequence.
How it works: Users can now leverage video recording and voice input to interact with Google Lens, making visual searches more dynamic and intuitive.
- To use the new feature, users open the Google Lens app, hold down the shutter button to start recording, and verbally ask a question about what they’re observing.
- The system captures the video as a series of image frames, which are then analyzed using advanced computer vision techniques.
- A custom Gemini AI model processes the visual information and user query, providing a response rooted in web-based information.
Practical applications: The video search feature opens up new possibilities for users to interact with their environment and obtain information in real-time.
- Google suggests the feature could be useful in scenarios such as visiting an aquarium, where users can ask questions about the marine life they’re observing.
- The technology allows for more contextual and detailed queries that may not be easily captured in a single image.
Voice search enhancement: In addition to video search, Google Lens has also updated its photo search feature with voice input capabilities.
- Users can now ask questions verbally while aiming their camera at a subject, eliminating the need to type queries after taking a picture.
- This feature is rolling out globally on Android and iOS but is currently only available in English.
Technical insights: Rajan Patel, Google’s vice president of engineering, provided some background on the technology powering these new features.
- The video search functionality builds upon existing image recognition techniques used in Google Lens.
- A custom Gemini model was developed specifically to understand and process multiple video frames in sequence.
- While the current implementation doesn’t support audio analysis, such as identifying bird sounds, Google is reportedly experimenting with this capability for future updates.
Broader implications: These advancements in visual search technology reflect the ongoing evolution of how users interact with and obtain information from their surroundings.
- The integration of video and voice search in Google Lens represents a significant step towards more natural and intuitive human-computer interaction.
- As AI models like Gemini continue to improve, we can expect even more sophisticated visual search capabilities in the future, potentially transforming how we access and process information in our daily lives.
Looking ahead: While these features mark a substantial improvement in visual search technology, there’s still room for growth and refinement.
- The potential addition of audio analysis to video searches could further enhance the app’s utility, especially for tasks like wildlife identification.
- As the technology evolves, we may see more seamless integration of visual, auditory, and contextual information in search queries, bringing us closer to a truly comprehensive understanding of our environment through AI-assisted tools.
Google Lens now lets you search with video