Histogram to tdigest function#139637
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| Convert histogram into tdigest | ||
| required_capability: histogram_to_tdigest_cast | ||
|
|
||
| FROM histogram_standard_index | EVAL tdigest = to_tdigest(responseTime) | KEEP responseTime, tdigest ; |
There was a problem hiding this comment.
Maybe add a test where we do min, max and percentiles too?
| throw new IllegalArgumentException("Histogram length is greater than 2MB"); | ||
| } | ||
| // even though the encoded format is the same, we need to decode here to compute the summary data | ||
| List<Double> centroids = new ArrayList<>(); |
There was a problem hiding this comment.
Nit: I'd make this block a factory method for TDigestHolder, e.g. TDigestHolder.FromHistogram(), to better encapsulate and test. It can be cleaned up in a follow-up.
There was a problem hiding this comment.
Yeah, let's do that as a follow up. I think a lot of code probably should get moved into TDigestHolder. Really, the whole thing needs a refactoring pass.
| } | ||
|
|
||
| public void append(TDigestHolder val) { | ||
| public void appendTDigest(TDigestHolder val) { |
There was a problem hiding this comment.
Why was this renamed? I'd keep such changes separate, to keep the delta low.
There was a problem hiding this comment.
This was renamed because https://github.com/elastic/elasticsearch/pull/139637/files#diff-1493726365f74fb0808668670bda6f2c9eb76c1bd63bb951562e28724036571dR234 does string based method construction and I was uncomfortable putting a generic name in there. I agree in a perfect world it would have been a separate PR, but I was trying to move as quickly as possible here.
...src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/ToTDigest.java
Show resolved
Hide resolved
Co-authored-by: Jonas Kunz <j+github@kunzj.de>
This takes the Histogram data type out from under construction and makes it available in release builds. This includes basic support for scalar and aggregation functions that operate on all types (except for Count, Coalesce, and Case, which are going to need special handling later). In conjunction with #139637, this will allow aggregating histogram fields as t-digests to compute percentiles and other statistics. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Resolves elastic#141664 This adds parameterized testing for the TO_TDIGEST function added in elastic#139637. It was an oversight that this wasn't added at the same time as the function. Including the tests also auto-generates the function docs, which are included in this PR.
Adds a function to convert from histogram data types to tdigest data types. This unfortunately requires decoding the histogram, even though the two encodings are the same, since we need to calculate the summary fields that don't exist for histogram. Maybe we can improve that down the line.
The testing for this is very sparse. We need to work on that in the near future, but this should be enough for a prototype.