Resources

Datasets

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure and advance AI progress.

Featured Dataset

SA-V Dataset

SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.

Overview

Datasets

FACET Dataset

FACET is a comprehensive benchmark dataset designed for measuring or evaluating the robustness and algorithmic fairness of AI and machine-learning vision models for protected groups.

EgoTV Dataset

A benchmark and dataset for systematic investigation of vision-language models on compositional, causal (e.g., effect of actions), and temporal (e.g., action ordering) reasoning in egocentric settings.

MMCSG Dataset

The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements.

Speech Fairness Dataset

By releasing this dataset, we hope to further motivate the AI community to make strides toward improving the fairness of speech recognition models, which will help all users have a better experience using applications with ASR.

Casual Conversations V2

For evaluating computer vision, audio and speech models for accuracy across a diverse set of ages, genders, language/dialects, geographies, disabilities, and more.

Casual Conversations

For evaluating computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.

Common Objects in 3D (CO3D)

For learning category-specific 3D reconstruction and new-view synthesis using multi-view images of common object categories.

Segment Anything

Designed for training general-purpose object segmentation models from open world images.

DISC21 Dataset

Helps researchers evaluate their image copy detection models for accuracy.

EgoObjects Dataset

A project that seeks to advance the fundamental AI research needed for multi-modal machine perception for first-person video understanding.

FLoRes Benchmarking Dataset

Used for machine translation between English and low-resource languages.

Ego4d

Ego4D is a collaborative project, seeking to advance the fundamental AI research needed for multimodal machine perception for first-person video understanding.