Skip to content

A desktop UI for inspecting litData shards, MosaicML Streaming shards, and WebDataset tar shards, and previewing Hugging Face and Zenodo datasets directly online

License

Notifications You must be signed in to change notification settings

binbinsh/dataset-inspector

Repository files navigation

Dataset Inspector Icon

Dataset Inspector

macOS build Ubuntu build Windows build

About

Dataset Inspector is a desktop UI for inspecting local Lightning-AI/litData shards, MosaicML Streaming (MDS) shards, and WebDataset tar shards, with support for previewing Hugging Face and Zenodo datasets directly online without downloading.

Features

  • Inspect local LitData shards (index.json + .bin/.zst chunks).
  • Inspect local MosaicML Streaming (MDS) shards (index.json + .mds/.mds.zst).
  • Inspect local WebDataset shards (.tar, .tar.gz, .tar.zst).
  • Inspect Hugging Face datasets via streaming API (no full local download).
  • Inspect Zenodo datasets via HTTP Range request (no full local download).
  • Preview json/audio/image, copy values, and open extracted fields with your default app.

Local LitData shards

Local WebDataset tar shards

Hugging Face dataset preview

Zenodo record preview

Usage

  1. Download Dataset Inspector installers from Releases.
  2. Browse local LitData/MosaicML/WebDataset folders, or HF URLs, or Zenodo URLs, then press Load.
  3. LitData / MosaicML shards: pick a shard → item → field, then preview fields.
  4. WebDataset shards: pick a shard → sample → field, then preview/open files.
  5. Hugging Face datasets: pick a split → row → field to preview values.
  6. Report issues/ feature requests: https://github.com/binbinsh/dataset-inspector/issues

About

A desktop UI for inspecting litData shards, MosaicML Streaming shards, and WebDataset tar shards, and previewing Hugging Face and Zenodo datasets directly online

Resources

License

Stars

Watchers

Forks

Languages