ESQL: Add nulls support to Categorize by ivancea · Pull Request #117655 · elastic/elasticsearch

ivancea · 2024-11-27T17:27:10Z

Handle nulls and empty strings (Which resolve to null) on Categorize grouping function.

Also, implement seenGroupIds(), which would fail some queries with nulls otherwise.

elasticsearchmachine · 2024-11-27T17:27:35Z

Hi @ivancea, I've created a changelog YAML for you.

elasticsearchmachine · 2024-11-27T17:51:56Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copilot

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 suggestions.

Files not reviewed (1)

x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported

Comments skipped due to low confidence (6)

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:67

[nitpick] The change from static to final for the CategorizeEvaluator class may be unintended. If this class is intended to be nested within another class without requiring an instance of the outer class, it should remain static.

public final class CategorizeEvaluator implements Releasable {

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:64

The close method should also handle the seenNull variable if it is used to track state. Ensure that any state variables are reset or handled appropriately when closing resources.

@Override public void close() { categorizer.close(); }

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java:126

Ensure that the eval method is covered by tests, especially for cases where null values are encountered and handled by the process method.

public IntVector eval(int positionCount, BytesRefVector vVector) {

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java:83

The increment of category IDs by 1 could cause confusion. Ensure this is intentional and document the reasoning.

idMap.put(oldCategoryId + 1, newCategoryId + 1);

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:119

Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.

public TokenListCategory computeCategory(String s, CategorizationAnalyzer analyzer) {

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/TokenListCategorizer.java:128

Ensure that the new behavior introduced by handling nulls in computeCategory is covered by tests.

public TokenListCategory computeCategory(TokenStream ts, int unfilteredStringLen, long numDocs) throws IOException {

...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java

...c/main/java/org/elasticsearch/compute/aggregation/blockhash/AbstractCategorizeBlockHash.java

...n/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java

x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec

jan-elastic

LGTM

elasticsearchmachine · 2024-11-28T15:08:27Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117655

Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.

alex-spies

I know I'm late to the party, but LGTM and thanks @ivancea !

Handle nulls and empty strings (Which resolve to null) on Categorize grouping function. Also, implement `seenGroupIds()`, which would fail some queries with nulls otherwise.

Add nulls support to Categorize

115fbf5

ivancea added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Nov 27, 2024

Update docs/changelog/117655.yaml

564e673

ivancea added 2 commits November 27, 2024 18:34

Correctly handle empty strings

71c572c

Implemented seenGroupIds

886772a

ivancea requested review from alex-spies, Copilot, costin, jan-elastic and nik9000 and removed request for costin and nik9000 November 27, 2024 17:51

ivancea marked this pull request as ready for review November 27, 2024 17:51

Copilot AI reviewed Nov 27, 2024

View reviewed changes

...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java Show resolved Hide resolved

...te/src/main/java/org/elasticsearch/compute/aggregation/blockhash/CategorizeRawBlockHash.java Show resolved Hide resolved

ivancea added 2 commits November 27, 2024 19:06

Updated categorize block tests with nulls

548c73e

Fix Categorize block hash tests

7f51d8c

jan-elastic reviewed Nov 28, 2024

View reviewed changes

...c/main/java/org/elasticsearch/compute/aggregation/blockhash/AbstractCategorizeBlockHash.java Show resolved Hide resolved

jan-elastic reviewed Nov 28, 2024

View reviewed changes

...n/java/org/elasticsearch/compute/aggregation/blockhash/CategorizedIntermediateBlockHash.java Outdated Show resolved Hide resolved

jan-elastic reviewed Nov 28, 2024

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec Show resolved Hide resolved

jan-elastic approved these changes Nov 28, 2024

View reviewed changes

ivancea and others added 2 commits November 28, 2024 12:41

Simplify CategorizeBlockHash add logic

387a88a

Merge branch 'main' into esql-categorize-nulls

6375d2b

ivancea added the auto-merge label Nov 28, 2024

Increase capability version

c19dad5

ivancea merged commit 6b94a91 into elastic:main Nov 28, 2024

ivancea deleted the esql-categorize-nulls branch November 28, 2024 15:07

elasticsearchmachine added the backport pending label Nov 28, 2024

ivancea mentioned this pull request Nov 28, 2024

[8.x] Backport "ESQL: Add nulls support to Categorize (#117655)" #117716

Merged

alex-spies reviewed Nov 28, 2024

View reviewed changes

ivancea removed the backport pending label Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Add nulls support to Categorize#117655

ESQL: Add nulls support to Categorize#117655
ivancea merged 9 commits intoelastic:mainfrom
ivancea:esql-categorize-nulls

ivancea commented Nov 27, 2024 •

edited

Loading

elasticsearchmachine commented Nov 27, 2024

elasticsearchmachine commented Nov 27, 2024

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-elastic left a comment

elasticsearchmachine commented Nov 28, 2024

alex-spies left a comment

Labels

5 participants

Conversation

ivancea commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Nov 27, 2024

elasticsearchmachine commented Nov 27, 2024

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-elastic left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 28, 2024

💔 Backport failed

alex-spies left a comment

Choose a reason for hiding this comment

Labels

5 participants

ivancea commented Nov 27, 2024 •

edited

Loading