×
Microsoft’s new AI model simulates worlds by watching game footage
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The emerging field of AI-powered game world generation has seen significant advances as researchers work to create systems that can understand and simulate gaming environments from video footage alone. Microsoft Research’s latest contribution to this field is WHAM (World and Human Action Model), which demonstrates notable progress in generating interactive gaming environments while highlighting current technological limitations.

Project Overview: Microsoft’s WHAM model, detailed in a recent Nature publication, uses extensive gameplay footage from the online brawler Bleeding Edge to create AI-generated gaming environments.

  • The system was trained on seven player-years worth of gameplay video paired with actual player inputs
  • Training data collection was conducted under the game’s user agreement through Microsoft subsidiary Ninja Theory
  • After one million training updates, WHAM demonstrated basic understanding of complex gameplay interactions

Technical Achievements: WHAM shows marked improvements over previous AI world models in several key areas.

  • The model can maintain consistent gameplay footage for up to two minutes, surpassing Google’s Genie 2 model’s one-minute capability
  • WHAM successfully responds to diverse input sequences not present in its training data
  • The system demonstrates 85-98% accuracy in maintaining the persistence of newly inserted game objects across generated frames

Current Capabilities: Microsoft has developed two primary implementations of WHAM technology.

  • A prototype “WHAM Demonstrator” available on Azure AI Foundry allows developers to generate new gameplay sequences from sample frames
  • An early real-time version enables immediate frame generation based on user inputs, with the ability to switch between scenes instantly

Technical Limitations: Despite its advances, WHAM faces significant constraints that currently restrict its practical applications.

  • Output is limited to 300×180 resolution at 10 frames per second
  • Generated footage exhibits inconsistencies, particularly in character models which often display unrealistic morphing
  • The real-time version operates well below the performance standards required for modern gaming

Development Implications: The technology currently serves primarily as a prototyping tool for game developers while pointing toward future possibilities.

  • Developers can use WHAM to quickly visualize and test gameplay concepts
  • The system represents progress toward real-time AI-generated gaming experiences
  • The technology shows potential for rapid interactive content creation

Looking Beyond the Horizon: While WHAM represents meaningful progress in AI-generated gaming environments, the gap between current capabilities and commercially viable applications remains substantial. The technology’s ability to maintain object persistence and respond to diverse inputs suggests promising developments ahead, though significant improvements in resolution, frame rate, and visual consistency will be necessary before practical implementation in commercial games becomes feasible.

Microsoft’s new interactive AI “world model” still has a long way to go

Recent News

New AI publishing platform lets readers talk with their favorite classic books

New AI platform blends expert commentary with interactive features to guide readers through philosophical classics.

OPPO’s new AI-powered phones set to launch this year

Chinese phone maker OPPO expands its lineup with AI features across multiple price points, from budget models to premium foldables.

AI-powered agents poised to upend US auto industry in customers’ favor

Car buyers show strong interest in AI assistance for maintenance alerts and repair verification as dealerships aim to restore consumer confidence.