The Meta Segment Anything Model 2 (SAM 2) represents a significant advancement in object segmentation for both images and videos, with the potential to revolutionize video segmentation and enable seamless application across various image and video use cases.
Key features and capabilities: SAM 2 is the first unified model for real-time, promptable object segmentation in images and videos, offering improved accuracy and performance compared to existing solutions:
- SAM 2 achieves better video segmentation performance than current methods while requiring three times less interaction time.
- The model can segment any object in any video or image without the need for custom adaptation, thanks to its zero-shot generalization capabilities.
Innovative research approach: Meta’s research on enabling video segmentation capabilities involves designing a new task, a model, and a dataset:
- The promptable visual segmentation task was developed, and the SAM 2 model was designed to perform this task.
- SAM 2 was used to create SA-V, a video object segmentation dataset an order of magnitude larger than existing datasets, which was then used to train SAM 2 to achieve state-of-the-art performance.
Open science and community engagement: In line with Meta’s open science approach, the company is sharing its research on SAM 2 with the community to encourage exploration of new capabilities and use cases:
- The SAM 2 code and weights are being open-sourced under an Apache 2.0 license, while the evaluation code is shared under a BSD-3 license.
- The SA-V dataset, containing ~51k real-world videos with more than 600k masklets, is being shared under a CC BY 4.0 license.
- A web demo has been released, enabling real-time interactive segmentation of short videos and the application of video effects on model predictions.
Broader implications and future outlook: By openly sharing this research, Meta aims to contribute to accelerating progress in universal video and image segmentation and related perception tasks:
- The release of SAM 2 and the SA-V dataset has the potential to drive innovation and the development of new applications in the field of object segmentation.
- As the AI community explores and builds upon this research, it is expected to lead to new insights and the creation of useful experiences across various domains.
- The open-source nature of the model and dataset will enable researchers and developers to further refine and adapt the technology to suit specific use cases and industries.
Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images