Skip to content

feat: add percentile-based truncated histogram option#1844

Open
matsumotominato wants to merge 2 commits intoData-Centric-AI-Community:developfrom
matsumotominato:feature/truncated-histogram
Open

feat: add percentile-based truncated histogram option#1844
matsumotominato wants to merge 2 commits intoData-Centric-AI-Community:developfrom
matsumotominato:feature/truncated-histogram

Conversation

@matsumotominato
Copy link
Copy Markdown

Summary

Add a percentile_cutoff option to the histogram configuration that allows
generating a truncated histogram alongside the standard one.

Problem

When a column contains extreme outliers (e.g., company revenues where most are
around $90K but some are in the billions), the histogram becomes a single bar
and is not useful for understanding the distribution.

Closes #1817

Changes

  • config.py: Added percentile_cutoff parameter to Histogram class (default: 0.0)
  • summary_algorithms.py: Compute truncated histogram when percentile_cutoff > 0
  • render_real.py: Display truncated histogram as a new tab in the report

Usage

profile = ProfileReport(
    df,
    plot={"histogram": {"percentile_cutoff": 0.05}},
)
Add a percentile_cutoff option to the histogram configuration that
allows generating a truncated histogram alongside the standard one.
This helps visualize distributions with extreme outliers by clipping
data to a specified percentile range (e.g., 5th-95th percentile).

Closes Data-Centric-AI-Community#1817
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant