×
Google’s new AI image tool Whisk uses images instead of texts as prompts
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

AI image generation capabilities have taken another leap forward with Google’s introduction of Whisk, a novel tool that creates AI-generated images from user-uploaded photos without requiring text input.

Core functionality: Whisk allows users to combine multiple input images depicting subjects, settings, and styles into a single AI-generated creation.

  • Users can upload photos representing different elements they want to incorporate without having to describe them in text
  • The tool offers options to create variations like plushie toys, enamel pins, or stickers
  • While text input is available for fine-tuning details, it’s not required to generate images

Technical architecture: Google’s new image generation system leverages multiple AI technologies working in concert.

  • The system combines Google’s Gemini AI with DeepMind’s Imagen 3 text-to-image generator
  • When users upload images, Gemini creates automatic captions that feed into Imagen 3
  • The process captures the “essence” rather than exact details of input images, allowing for creative interpretation

Key limitations and considerations: The tool has specific constraints and use cases that users should understand.

  • Google positions Whisk as a creative inspiration tool rather than a professional image editor
  • Generated images may vary from input photos in details like height, hairstyle, or skin tone
  • The tool is currently only available as a website through Google Labs for US users

Competitive landscape: Whisk represents Google’s latest move in an increasingly crowded AI image generation market.

  • OpenAI recently expanded into video generation with its Sora tool
  • The release follows Google’s earlier challenges with historical accuracy in its text-to-image generation tools
  • According to Wedbush Securities analyst Dan Ives, Whisk demonstrates Google’s commitment to showcasing its AI capabilities

Strategic implications: The development of Whisk indicates Google’s broader AI strategy and future direction.

  • DeepMind’s integration continues to be crucial for Google’s AI development
  • The tool is part of Google’s planned 2025 product lineup, which includes a new Android operating system
  • This release shows big tech companies’ ongoing race to develop consumer-facing AI applications despite concerns about AI safety and regulation

Looking ahead: While Whisk represents an innovative approach to image generation, its success will likely depend on user adoption and practical applications in creative workflows, particularly as the technology evolves beyond its current experimental stage.

Google’s new AI tool uses image prompts instead of text

Recent News

Databricks founder offers $1M to solve AI coding challenges

New competition offers $1 million prize for developing efficient, open-source AI coding models that can match human programmers' problem-solving capabilities.

ChatGPT is now on WhatsApp — here’s how to access it

OpenAI's latest WhatsApp integration brings basic AI assistance to billions of users in regions with limited internet access, running on a simplified version of GPT-4.

AI filmmakers can now find work on Runway’s new talent platform

As AI video tools become more sophisticated, production companies seek specialists who can blend creative vision with technical proficiency to deliver professional results.