Skip to content

Conversation

@Chowdhury-Anik
Copy link

@Chowdhury-Anik Chowdhury-Anik commented Nov 8, 2025

This commit adds four data sink methods for Polars LazyFrame:

  • sink_parquet: Write LazyFrame to Parquet format
  • sink_csv: Write LazyFrame to CSV format
  • sink_ipc: Write LazyFrame to IPC/Feather format
  • sink_ndjson: Write LazyFrame to NDJSON format

These sinks allow users to write LazyFrames directly without needing to call .collect() first, improving performance for large datasets.

Fixes #791

Changes

How I tested this

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.
Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- maybe add a test?

Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need some tests please

@skrawcz
Copy link
Contributor

skrawcz commented Dec 28, 2025

@Chowdhury-Anik just bumping this PR :) would love some tests please.

Chowdhury-Anik and others added 2 commits January 1, 2026 07:29
This commit adds four data sink methods for Polars LazyFrame:
- sink_parquet: Write LazyFrame to Parquet format
- sink_csv: Write LazyFrame to CSV format
- sink_ipc: Write LazyFrame to IPC/Feather format
- sink_ndjson: Write LazyFrame to NDJSON format

These sinks allow users to write LazyFrames directly without needing to call .collect() first, improving performance for large datasets.

Fixes apache#791
@skrawcz skrawcz force-pushed the Chowdhury-Anik-patch-1 branch from a38a271 to 93a56f0 Compare December 31, 2025 20:29
@skrawcz
Copy link
Contributor

skrawcz commented Dec 31, 2025

once #1429 is merged we can rebase this and this should be good to go.

@skrawcz skrawcz self-requested a review December 31, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants