Meta’s 3D Gen: Redefining Speed and Quality in AI-Generated 3D Assets

Written by Griffin Chiu

Published on July 16th, 2024 6:57 AM

Publication: Meta
Publication Date: July 2nd, 2024
Organizations mentioned: Meta, GenAI, Anthropic
Publication Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya et al.
Technical background required: High
Estimated read time (original text): 38 minutes
Sentiment score: 85%, very positive

Meta’s GenAI team introduces Meta 3D Gen, a state-of-the-art pipeline for text-to-3D asset generation. This technology addresses the time consuming and challenging aspects of 3D content creation for video games, augmented and virtual reality applications, and special effects in the movie industry. The development of 3DGen is driven by the growing demand for efficient, high-quality 3D content creation, particularly for applications in the Metaverse and user-generated 3D content.

3DGen integrates two key components: Meta 3D AssetGen for initial 3D generation and Meta 3D TextureGen for high-quality texturing.
The system utilizes multiple object representations (view space, volumetric space, and UV space) to enhance consistency and quality of generated assets.
3DGen is built on Meta’s Emu series of text-to-image models, fine-tuned using renders of synthetic 3D data.

3DGen generates high-quality 3D assets in under a minute, significantly outperforming industry competitors that take up to an hour for similar tasks.
The system supports Physically-Based Rendering (PBR), enabling realistic relighting of generated assets, which is critical for integrating assets into various lighting environments.
User studies conducted with both general population and professional 3D artists show that 3DGen is preferred in 68% of cases compared to single-stage models.
3DGen particularly excels with complex textual prompts, outperforming commercial alternatives in prompt fidelity and visual quality.
The two-stage process (Meta 3D AssetGen and Meta 3D TextureGen) allows for efficient generation and refinement of 3D assets, with inference times of approximately 30 seconds for initial asset creation and 20 seconds for texture refinement.
3DGen is 3-60 times faster than competing industry solutions while maintaining high-quality outputs.

Implement 3DGen for rapid prototyping of 3D assets in game development and VR/AR applications to significantly reduce production times and costs.
Utilize 3DGen for on-demand creation of personalized 3D content, enabling new experiences centered on user-generated 3D content.
Integrate 3DGen into film and TV production pipelines for efficient creation of 3D props and characters, streamlining the special effects process.
Explore the use of 3DGen for virtual product placement in user-generated content, opening new avenues for marketing and e-commerce.
Continue research to further enhance geometry accuracy and texture consistency, addressing current limitations such as occasional artifacts in generated assets.

Widespread adoption of Meta 3D Gen could lead to a democratization of 3D content creation, allowing smaller studios and independent creators to compete with larger, well-resourced companies in industries like gaming, film, and virtual reality. This could result in a surge of innovative content and applications, potentially reshaping the entertainment and tech landscapes.
The efficiency gains provided by 3DGen might accelerate the development of immersive technologies and the Metaverse, potentially fast-tracking societal shifts towards more virtual interactions, commerce, and experiences.
As AI-driven 3D generation becomes more prevalent, there may be significant workforce disruptions in industries reliant on traditional 3D modeling and texturing skills. This could necessitate a shift in educational and professional training programs to focus more on prompt engineering and AI tool manipulation rather than manual 3D creation techniques.

While 3DGen shows impressive speed and quality improvements, the evaluation metrics may not fully capture the nuanced artistic qualities that human 3D artists bring to their work. There could be concerns about the homogenization of 3D content if AI-generated assets become too prevalent, potentially leading to a loss of unique artistic styles and cultural diversity in digital creations.
The reliance on text prompts for 3D generation may introduce biases or limitations based on language and cultural contexts. This could potentially lead to a narrowing of creative possibilities or the reinforcement of certain stereotypes in 3D content, especially if the training data for the AI models is not sufficiently diverse.
The rapid pace of 3D asset generation enabled by 3DGen might encourage a quantity-over-quality approach in some industries, potentially leading to oversaturation of content and decreased overall quality standards. This could have negative impacts on user experiences and the perceived value of digital content.

Within the next 5 years, AI-driven 3D generation tools like 3DGen will become standard in most major game development and VFX pipelines, leading to a 50% reduction in time-to-market for AAA games and blockbuster films with heavy CGI elements.
By 2030, advancements in AI-generated 3D content will enable fully personalized and dynamically generated virtual environments in VR and AR applications, revolutionizing fields such as education, therapy, and remote work.
In the next decade, the combination of AI-generated 3D assets and advanced natural language processing will give rise to highly sophisticated virtual assistants capable of creating and manipulating 3D environments in real-time based on verbal commands, fundamentally changing how humans interact with digital spaces.

Meta 3D Gen (3DGen): A state-of-the-art pipeline for text-to-3D asset generation that combines Meta 3D AssetGen and Meta 3D TextureGen.
Meta 3D AssetGen: The first stage of 3DGen that creates an initial 3D asset from a text prompt, producing a 3D mesh with texture and PBR material maps.
Meta 3D TextureGen: The second stage of 3DGen that refines textures and PBR maps for higher quality or generates textures for untextured 3D meshes.
Emu: Meta’s series of powerful text-to-image models that serve as the foundation for 3DGen.