Fix inner hits + aggregations concurrency bug#128036
Merged
benchaplin merged 21 commits intoelastic:mainfrom Jun 2, 2025
Merged
Fix inner hits + aggregations concurrency bug#128036benchaplin merged 21 commits intoelastic:mainfrom
benchaplin merged 21 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Collaborator
|
Hi @benchaplin, I've created a changelog YAML for you. |
javanna
reviewed
May 19, 2025
...er/src/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/TopHitsIT.java
Show resolved
Hide resolved
3647f68 to
ff7d042
Compare
benchaplin
added a commit
to benchaplin/elasticsearch
that referenced
this pull request
Jun 2, 2025
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
elasticsearchmachine
pushed a commit
that referenced
this pull request
Jun 2, 2025
Contributor
|
Hey @benchaplin I think it makes sense to backport this fix to 9.0 as well. Thoughts? |
mridula-s109
pushed a commit
to mridula-s109/elasticsearch
that referenced
this pull request
Jun 3, 2025
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
benchaplin
added a commit
to benchaplin/elasticsearch
that referenced
this pull request
Jun 3, 2025
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Contributor
Author
Ah, agreed. Thanks for catching this, I got a little mixed up with versions. |
elasticsearchmachine
pushed a commit
that referenced
this pull request
Jun 3, 2025
joshua-adams-1
pushed a commit
to joshua-adams-1/elasticsearch
that referenced
this pull request
Jun 3, 2025
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Samiul-TheSoccerFan
pushed a commit
to Samiul-TheSoccerFan/elasticsearch
that referenced
this pull request
Jun 5, 2025
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #122419.
There's a concurrency bug that occurs when doing aggregations on inner hits. It can result in one of three exceptions:
java.lang.IllegalStateException: Error retrieving pathjava.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because "this.preloadedStoredFieldValues" is nulljava.lang.AssertionError: invalid decRef call: already closedThe underlying issue is that
InnerHitSubContextis not thread safe, yet instances are shared across leaf slice search threads during an aggregation. Specifically, the race condition occurs whenInnerHitSubContext.rootId&InnerHitSubContext.rootSourcefields are set and accessed concurrently by multiple threads.The tests I've added to
TopHitsITreproduce the issue. If you paste those tests into main and run them a few times you should see one of the exceptions.I've solved this by forking the
InnerHitSubContextinstances, similar to what was done here #106990.SearchExecutionContextis at times accessed fromInnerHitSubContext, so I also had to make sure the forkedSearchExecutionContextwas used in those cases.