Create new block when filter OrdinalBytesRefBlock by dnhatn · Pull Request #136444 · elastic/elasticsearch

dnhatn · 2025-10-11T05:31:31Z

Currently, when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector, we reuse the existing dictionary and filter only the ordinals. Although this is semantically correct, it can break HashAggregationOperator when ordinal optimizations are enabled in BlockHash.

Example: given an incoming OrdinalBytesRefBlock after filtering:

dict: [apple, banana, orange]
ordinals: [0, 0, 0, 0, 1, 1]

Here, orange is not referenced. During grouping and hashing, all dictionary entries [apple, banana, orange] are hashed with group IDs [1, 2, 3] (0 reserved for null), and the block produces groupings [1, 1, 1, 1, 2, 2]. The output is correct, but when evaluating partial/final results, we cannot exclude orange (groupId=3) unless we enter slow mode and track every seen group. This can cause an IndexOutOfBoundsException if orange is the largest group and the over-allocation is not enough to cover it, or orange may be included in the result even though it was excluded previously.

This change enforces the creation of a new Block/Vector when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector.

Closes #136423

elasticsearchmachine · 2025-10-11T05:35:13Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-10-11T05:35:13Z

Hi @dnhatn, I've created a changelog YAML for you.

dnhatn · 2025-10-11T05:39:01Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

+15             | Senior Team Lead
+11             | Support Engineer
+15             | Tech Lead    
+;


…lter-ordinals

nik9000 · 2025-10-15T12:57:32Z

Oh that one's exciting!

dnhatn · 2025-10-15T15:14:15Z

Thanks Nik!

elasticsearchmachine · 2025-10-15T15:16:17Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts
❌	9.1	Commit could not be cherrypicked due to conflicts
✅	9.2

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 136444

Currently, when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector, we reuse the existing dictionary and filter only the ordinals. Although this is semantically correct, it can break HashAggregationOperator when ordinal optimizations are enabled in BlockHash. Example: given an incoming OrdinalBytesRefBlock after filtering: dict: [apple, banana, orange] ordinals: [0, 0, 0, 0, 1, 1] Here, orange is not referenced. During grouping and hashing, all dictionary entries [apple, banana, orange] are hashed with group IDs [1, 2, 3] (0 reserved for null), and the block produces groupings [1, 1, 1, 1, 2, 2]. The output is correct, but when evaluating partial/final results, we cannot exclude orange (groupId=3) unless we enter slow mode and track every seen group. This can cause an IndexOutOfBoundsException if orange is the largest group and the over-allocation is not enough to cover it, or orange may be included in the result even though it was excluded previously. This change enforces the creation of a new Block/Vector when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector. Closes elastic#136423

Currently, when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector, we reuse the existing dictionary and filter only the ordinals. Although this is semantically correct, it can break HashAggregationOperator when ordinal optimizations are enabled in BlockHash. Example: given an incoming OrdinalBytesRefBlock after filtering: dict: [apple, banana, orange] ordinals: [0, 0, 0, 0, 1, 1] Here, orange is not referenced. During grouping and hashing, all dictionary entries [apple, banana, orange] are hashed with group IDs [1, 2, 3] (0 reserved for null), and the block produces groupings [1, 1, 1, 1, 2, 2]. The output is correct, but when evaluating partial/final results, we cannot exclude orange (groupId=3) unless we enter slow mode and track every seen group. This can cause an IndexOutOfBoundsException if orange is the largest group and the over-allocation is not enough to cover it, or orange may be included in the result even though it was excluded previously. This change enforces the creation of a new Block/Vector when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector. Closes #136423

Currently, when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector, we reuse the existing dictionary and filter only the ordinals. Although this is semantically correct, it can break HashAggregationOperator when ordinal optimizations are enabled in BlockHash. Example: given an incoming OrdinalBytesRefBlock after filtering: dict: [apple, banana, orange] ordinals: [0, 0, 0, 0, 1, 1] Here, orange is not referenced. During grouping and hashing, all dictionary entries [apple, banana, orange] are hashed with group IDs [1, 2, 3] (0 reserved for null), and the block produces groupings [1, 1, 1, 1, 2, 2]. The output is correct, but when evaluating partial/final results, we cannot exclude orange (groupId=3) unless we enter slow mode and track every seen group. This can cause an IndexOutOfBoundsException if orange is the largest group and the over-allocation is not enough to cover it, or orange may be included in the result even though it was excluded previously. This change enforces the creation of a new Block/Vector when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector. Closes elastic#136423

Create new block when filter OrdinalBytesRefBlock

bbede47

elasticsearchmachine added the v9.3.0 label Oct 11, 2025

dnhatn added :Analytics/ES|QL AKA ESQL >bug v9.2.1 v9.1.6 v8.19.6 auto-backport Automatically create backport pull requests when merged labels Oct 11, 2025

dnhatn requested a review from nik9000 October 11, 2025 05:34

dnhatn marked this pull request as ready for review October 11, 2025 05:34

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 11, 2025

Update docs/changelog/136444.yaml

c74aafb

dnhatn commented Oct 11, 2025

View reviewed changes

dnhatn added 2 commits October 12, 2025 10:45

Merge remote-tracking branch 'elastic/main' into fix-filter-ordinals

048711e

Merge remote-tracking branch 'dnhatn/fix-filter-ordinals' into fix-fi…

9d884da

…lter-ordinals

nik9000 approved these changes Oct 15, 2025

View reviewed changes

Merge remote-tracking branch 'elastic/main' into fix-filter-ordinals

e1dc0c9

dnhatn merged commit d83584c into elastic:main Oct 15, 2025
34 checks passed

dnhatn deleted the fix-filter-ordinals branch October 15, 2025 15:14

dnhatn mentioned this pull request Oct 15, 2025

[9.2] Create new block when filter OrdinalBytesRefBlock (#136444) #136634

Merged

elasticsearchmachine added the backport pending label Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new block when filter OrdinalBytesRefBlock#136444

Create new block when filter OrdinalBytesRefBlock#136444
dnhatn merged 5 commits intoelastic:mainfrom
dnhatn:fix-filter-ordinals

dnhatn commented Oct 11, 2025 •

edited

Loading

elasticsearchmachine commented Oct 11, 2025

elasticsearchmachine commented Oct 11, 2025

dnhatn Oct 11, 2025

nik9000 commented Oct 15, 2025

dnhatn commented Oct 15, 2025

Uh oh!

elasticsearchmachine commented Oct 15, 2025

Labels

3 participants

Conversation

dnhatn commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Oct 11, 2025

elasticsearchmachine commented Oct 11, 2025

dnhatn Oct 11, 2025

Choose a reason for hiding this comment

nik9000 commented Oct 15, 2025

dnhatn commented Oct 15, 2025

Uh oh!

elasticsearchmachine commented Oct 15, 2025

💔 Backport failed

Labels

3 participants

dnhatn commented Oct 11, 2025 •

edited

Loading