ESQL: Add nulls support to Categorize#117655
Conversation
|
Hi @ivancea, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 suggestions.
Files not reviewed (1)
- x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported
Comments skipped due to low confidence (6)
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:67
- [nitpick] The change from static to final for the CategorizeEvaluator class may be unintended. If this class is intended to be nested within another class without requiring an instance of the outer class, it should remain static.
public final class CategorizeEvaluator implements Releasable {
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:64
- The close method should also handle the seenNull variable if it is used to track state. Ensure that any state variables are reset or handled appropriately when closing resources.
@Override public void close() { categorizer.close(); }
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:126
- Ensure that the eval method is covered by tests, especially for cases where null values are encountered and handled by the process method.
public IntVector eval(int positionCount, BytesRefVector vVector) {
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java:83
- The increment of category IDs by 1 could cause confusion. Ensure this is intentional and document the reasoning.
idMap.put(oldCategoryId + 1, newCategoryId + 1);
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:119
- Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(String s, CategorizationAnalyzer analyzer) {
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:128
- Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.
public TokenListCategory computeCategory(TokenStream ts, int unfilteredStringLen, long numDocs) throws IOException {
...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java
Show resolved
Hide resolved
...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/compute/aggregation/blockhash/AbstractCategorizeBlockHash.java
Show resolved
Hide resolved
...n/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec
Show resolved
Hide resolved
💔 Backport failed
You can use sqren/backport to manually backport by running |
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
alex-spies
left a comment
There was a problem hiding this comment.
I know I'm late to the party, but LGTM and thanks @ivancea !
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.
Handle nulls and empty strings (Which resolve to null) on Categorize grouping function.
Also, implement
seenGroupIds(), which would fail some queries with nulls otherwise.