Featured Dataset
SA-V Dataset
SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.
Datasets

FACET Dataset
FACET is a comprehensive benchmark dataset designed for measuring or evaluating the robustness and algorithmic fairness of AI and machine-learning vision models for protected groups.

EgoTV Dataset
A benchmark and dataset for systematic investigation of vision-language models on compositional, causal (e.g., effect of actions), and temporal (e.g., action ordering) reasoning in egocentric settings.

MMCSG Dataset
The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements.

Speech Fairness Dataset
By releasing this dataset, we hope to further motivate the AI community to make strides toward improving the fairness of speech recognition models, which will help all users have a better experience using applications with ASR.

Casual Conversations V2
For evaluating computer vision, audio and speech models for accuracy across a diverse set of ages, genders, language/dialects, geographies, disabilities, and more.

Casual Conversations
For evaluating computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.
Our approach
Latest news
Foundational models