Skip to content

typing: accept sequences for Dataset file loaders#8067

Open
biefan wants to merge 1 commit intohuggingface:mainfrom
biefan:typing/use-sequence-for-path-or-paths-5354
Open

typing: accept sequences for Dataset file loaders#8067
biefan wants to merge 1 commit intohuggingface:mainfrom
biefan:typing/use-sequence-for-path-or-paths-5354

Conversation

@biefan
Copy link
Copy Markdown

@biefan biefan commented Mar 14, 2026

Summary

  • update Dataset.from_csv, from_json, from_parquet, and from_text type hints from list[PathLike] to Sequence[PathLike]
  • normalize non-string sequences to lists before passing paths to readers
  • add path-type tests to cover tuple input for all four loaders

Why

list is invariant for static type checkers, so list[str] can be rejected against list[PathLike] even though str itself is valid path input. Sequence[PathLike] is covariant and better matches real usage.

Validation

  • uv run --python 3.11 --with-editable . --with pytest --with setuptools --with absl-py -m pytest tests/test_arrow_dataset.py::test_dataset_from_csv_path_type tests/test_arrow_dataset.py::test_dataset_from_json_path_type tests/test_arrow_dataset.py::test_dataset_from_parquet_path_type tests/test_arrow_dataset.py::test_dataset_from_text_path_type -q
  • uv run --python 3.11 --with ruff --with setuptools ruff check src/datasets/arrow_dataset.py tests/test_arrow_dataset.py

Fixes #5354

Signed-off-by: biefan <70761325+biefan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant