Fix a bug in significant_terms#127975
Merged
nik9000 merged 2 commits intoelastic:mainfrom May 9, 2025
Merged
Conversation
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @nik9000, I've created a changelog YAML for you. |
not-napoleon
approved these changes
May 9, 2025
Member
not-napoleon
left a comment
There was a problem hiding this comment.
LGTM, thank you for fixing this.
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
Collaborator
💔 Backport failed
You can use sqren/backport to manually backport by running |
Member
Author
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
elasticsearchmachine
pushed a commit
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
Member
Author
|
8.17 isn't backporting well. We'll keep this in 8.18+. |
elasticsearchmachine
pushed a commit
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
elasticsearchmachine
pushed a commit
that referenced
this pull request
May 9, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
jfreden
pushed a commit
to jfreden/elasticsearch
that referenced
this pull request
May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
julio-santana
pushed a commit
that referenced
this pull request
May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard.
prdoyle
pushed a commit
that referenced
this pull request
May 12, 2025
Fix a bug in the `significant_terms` agg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it. This mostly hits when the you do a `range` agg containing a `significant_terms` AND you only collect the first few ranges. `range` isn't particularly popular, but `date_histogram` is super popular and it rewrites into a `range` pretty commonly - so that's likely what's really hitting this - a `date_histogram` followed by a `significant_text` where the matches are all early in the date range held by the shard. Co-authored-by: Nik Everett <nik9000@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix a bug in the
significant_termsagg where the "subsetSize" array is too small because we never collect the ordinal for the agg "above" it.This mostly hits when the you do a
rangeagg containing asignificant_termsAND you only collect the first few ranges.rangeisn't particularly popular, butdate_histogramis super popular and it rewrites into arangepretty commonly - so that's likely what's really hitting this - adate_histogramfollowed by asignificant_textwhere the matches are all early in the date range held by the shard.