New AI text-to-image generator HART delivers unprecedented speed, creating images in just 1.8 seconds—over 5 times faster than leading diffusion models like DALL-E and Imagen 3. Developed by MIT, Nvidia, and Tsinghua University, HART uses an innovative autoregressive approach that generates images step-by-step rather than through the diffusion process used by most popular generators. This breakthrough demonstrates how different AI training methodologies can dramatically impact performance, potentially setting a new standard for real-time image generation applications.
The big picture: HART (Hybrid Autoregressive Transformer) represents a significant leap forward in AI image generation speed while maintaining comparable image quality to slower competitors.
How it works: Unlike most popular text-to-image generators that use diffusion models, HART employs an autoregressive (AR) approach similar to OpenAI’s recently released GPT-4o image generator.
Quality comparison: While HART’s primary advantage is speed, the quality of its outputs remains competitive with leading image generators.
Why this matters: As AI image generation becomes more integrated into creative workflows and applications, the dramatic reduction in generation time could enable new real-time use cases and significantly improve user experience.
Open access: The model is available for free public use and its inference code has been open-sourced through a public GitHub repository, making it accessible to developers, researchers, and AI enthusiasts for further experimentation.