Skip to content

Respect physical data placement in parquet iterators #4219

Open
@kolesnikovae

Description

@kolesnikovae

parquet.List of structs colocates fields on the same page. This means that we should never create an individual iterator for each of the columns in such cases (example): in fact, we fetch same pages repeatedly.

In addition, parquet reader issues a read operation for every ReadBufferPage size (+ page/group bounds), which prevents efficient streaming of data ranges from object storage.

Metadata

Metadata

Assignees

Labels

storageLow level storage matters

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions