Bezel’s researchers have created a breakthrough AI system that automates image quality improvement through iterative refinement. Their approach pairs large language models as intelligent evaluators with image generation APIs to detect and fix visual imperfections like blurry text or poor composition. This research demonstrates how LLMs excel at identifying semantic-level image defects while struggling with pixel-level corrections, providing valuable insights for the rapidly evolving field of AI-generated visual content.
The methodology: Bezel built a system that automatically improves OpenAI API-generated images by creating a feedback loop between evaluation and generation.
- The team utilized OpenAI’s Image API with two key endpoints—/create for generating images and /edit for modifying them with optional masking capabilities.
- They employed multiple LLMs as evaluators, finding that o3 performed most effectively for detecting text blurriness and image appeal issues, with gemini-2.5-flash-preview-04-17 serving as a benchmark.
Key findings: LLMs demonstrate strong capabilities in reasoning about visual imperfections but struggle to translate those insights into precise pixel-space edits.
- The models excel at identifying semantic-level defects such as illegible text or distorted branding but have difficulty mapping these insights to localized image corrections.
- Performance degraded when LLMs were asked to balance abstract aesthetic judgments with deterministic pixel-level corrections, revealing limitations in bridging symbolic understanding with spatial applications.
Why this matters: This research positions LLM-as-a-Judge as the preferred approach for multi-modal evaluation of AI-generated images.
- The work highlights a critical gap in current AI systems: the ability to seamlessly connect high-level visual reasoning with precise mechanical image editing.
- These findings point toward a new era in image generation where AI can follow guided creative direction while autonomously improving its outputs through multiple iterations.
The big picture: Bezel’s work originated from practical business applications, as brands began requesting AI-generated ad inspirations tailored to their customers.
- The research demonstrates how commercial demands are driving innovation in applied AI, particularly at the intersection of language models and visual content creation.
- This intersection of business needs and technical capability suggests emerging opportunities for AI systems that can self-critique and refine creative outputs.
Horizon - AI-Powered Opinion Simulation