Skip to content

Histogram to tdigest function#139637

Merged
not-napoleon merged 7 commits intoelastic:mainfrom
not-napoleon:histogram-to-tdigest-function
Dec 17, 2025
Merged

Histogram to tdigest function#139637
not-napoleon merged 7 commits intoelastic:mainfrom
not-napoleon:histogram-to-tdigest-function

Conversation

@not-napoleon
Copy link
Member

Adds a function to convert from histogram data types to tdigest data types. This unfortunately requires decoding the histogram, even though the two encodings are the same, since we need to calculate the summary fields that don't exist for histogram. Maybe we can improve that down the line.

The testing for this is very sparse. We need to work on that in the near future, but this should be enough for a prototype.

@not-napoleon not-napoleon added >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL v9.3.0 labels Dec 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Convert histogram into tdigest
required_capability: histogram_to_tdigest_cast

FROM histogram_standard_index | EVAL tdigest = to_tdigest(responseTime) | KEEP responseTime, tdigest ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test where we do min, max and percentiles too?

throw new IllegalArgumentException("Histogram length is greater than 2MB");
}
// even though the encoded format is the same, we need to decode here to compute the summary data
List<Double> centroids = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd make this block a factory method for TDigestHolder, e.g. TDigestHolder.FromHistogram(), to better encapsulate and test. It can be cleaned up in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's do that as a follow up. I think a lot of code probably should get moved into TDigestHolder. Really, the whole thing needs a refactoring pass.

}

public void append(TDigestHolder val) {
public void appendTDigest(TDigestHolder val) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this renamed? I'd keep such changes separate, to keep the delta low.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was renamed because https://github.com/elastic/elasticsearch/pull/139637/files#diff-1493726365f74fb0808668670bda6f2c9eb76c1bd63bb951562e28724036571dR234 does string based method construction and I was uncomfortable putting a generic name in there. I agree in a perfect world it would have been a separate PR, but I was trying to move as quickly as possible here.

@not-napoleon not-napoleon enabled auto-merge (squash) December 17, 2025 16:41
@not-napoleon not-napoleon merged commit 7a5dc6a into elastic:main Dec 17, 2025
35 checks passed
not-napoleon added a commit that referenced this pull request Dec 17, 2025
This takes the Histogram data type out from under construction and makes it available in release builds. This includes basic support for scalar and aggregation functions that operate on all types (except for Count, Coalesce, and Case, which are going to need special handling later). In conjunction with #139637, this will allow aggregating histogram fields as t-digests to compute percentiles and other statistics.
---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
not-napoleon added a commit that referenced this pull request Feb 3, 2026
Resolves #141664

This adds parameterized testing for the TO_TDIGEST function added in #139637. It was an oversight that this wasn't added at the same time as the function. Including the tests also auto-generates the function docs, which are included in this PR.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Feb 4, 2026
Resolves elastic#141664

This adds parameterized testing for the TO_TDIGEST function added in elastic#139637. It was an oversight that this wasn't added at the same time as the function. Including the tests also auto-generates the function docs, which are included in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.3.0

4 participants