The field of genomic research has historically been limited by computing power and the ability to process long genetic sequences. Evo 2, a groundbreaking AI foundation model, now enables scientists to analyze genetic code across diverse species with unprecedented capability.
The breakthrough: Arc Institute and Stanford University have released Evo 2, the largest publicly available AI model for genomic data, built using NVIDIA’s DGX Cloud platform on AWS.
- The model was trained on nearly 9 trillion nucleotides, the fundamental building blocks of DNA and RNA
- Evo 2 is accessible through NVIDIA’s BioNeMo platform and can be deployed as a NIM microservice
- The model can process genetic sequences up to 1 million tokens in length, providing a comprehensive view of genomic data
Technical capabilities: Evo 2 represents a significant advancement in computational biology, offering powerful tools for genetic research and biomolecular applications.
- Scientists can use the model to predict protein form and function based on genetic sequences
- The system can identify novel molecules for healthcare and industrial applications
- Researchers can evaluate how gene mutations affect biological function, with 90% accuracy demonstrated in tests with the BRCA1 breast cancer gene
Infrastructure and development: The project leveraged substantial computing resources and institutional support to achieve its goals.
- The model was trained using 2,000 NVIDIA H100 GPUs via NVIDIA DGX Cloud on AWS
- Arc Institute, established in 2021 with $650 million in funding, provided the research environment
- The collaboration includes partnerships with Stanford University, UC Berkeley, and UC San Francisco
Practical applications: Evo 2’s capabilities extend across multiple scientific domains with potential real-world impact.
- Healthcare researchers can use the model to understand disease-related gene variants and design targeted treatments
- Agricultural scientists can develop more resilient and nutrient-dense crops
- Environmental applications include the design of biofuels and proteins that can break down pollutants like oil and plastic
Looking beyond the horizon: While Evo 2’s immediate applications are promising, its full potential remains to be discovered as researchers begin exploring its capabilities in various fields.
- The model’s ability to process longer sequences could reveal previously unknown connections in genetic code
- Its broad training across multiple species enables cross-domain insights
- The open availability of the model could accelerate scientific discoveries across multiple disciplines
The true significance of Evo 2 may lie not just in its current capabilities, but in how it democratizes access to advanced genomic research tools, potentially accelerating the pace of scientific discovery in ways that are difficult to predict.
Massive Foundation Model for Biomolecular Sciences Now Available via NVIDIA BioNeMo