Skip to content

Fix nested aggregation top_hits with query inner_hits#137351

Merged
elasticsearchmachine merged 18 commits intoelastic:mainfrom
mayya-sharipova:fix-nested-top-hits
Jan 15, 2026
Merged

Fix nested aggregation top_hits with query inner_hits#137351
elasticsearchmachine merged 18 commits intoelastic:mainfrom
mayya-sharipova:fix-nested-top-hits

Conversation

@mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 29, 2025

Currently top_hits aggs inside nested aggs inherit the parent
query's inner_hits context, causing an error "Error retrieving path".
This happenes because nested top_hits is already operating on nested child docs,
and asking it to retrieve inner_hits adds an extra incorrect level of nestedness.

This fix proposes to create an empty InnerHitsContext for this case.

Closes #136893

Top_hits aggregations inside nested aggregations were inheriting the parent
query's inner_hits context, causing conflicting fetch scopes and nested source
extraction to fail with "Error retrieving path".

When top_hits is a child of NestedAggregator, now creates an empty
InnerHitsContext to prevent InnerHitsPhase from running.

Closes elastic#136893
@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@benwtrent
Copy link
Member

@mayya-sharipova maybe I misunderstand how this will look for users.

But as a user, I would still expect to get inner hits for both whenever I request them. One that is present within _source the other that is present within the aggregator results. Especially since they will likely be different documents (top-hits under a terms agg will definitely return different inner hits than those that are present in the most relevant documents given the top level query).

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Oct 30, 2025

@benwtrent Thanks for your comment. This fix applies only when top_hits agg is a part of nested. When we do nested top_hits agg we already retrieving nested child docs.
So we discussed this offline:

  • Documentation how nested agg works with top_hits agg
  • We should first check if there is second level of nestedness and allow inner_hits, if not it makes sense to remove them.
@mayya-sharipova
Copy link
Contributor Author

@benwtrent I've checked an it looks like inner_hits for double nested fields in top_hits doesn't work (for most docs doesn't return anything).
So something like this doesn't work:

PUT comments
{
    "mappings": {
        "properties": {
            "product": {
                "type": "keyword"
            },
            "comments": {
                "type": "nested",
                "properties": {
                  "comment": {
                    "type": "text"
                    },
                    "user": {
                      "type" : "keyword"
                    },
                    "ratings": {
                      "type": "nested",
                        "properties": {
                            "rating": {
                                "type": "integer"
                            },
                            "user": {
                              "type": "keyword"
                            }
                        }
                    }
                }
            }
        }
    }
}

POST /comments/_search?_source=false
{
  "aggs": {
    "nested_ratings": {
      "nested": {
        "path": "comments"  
      },
      "aggs": {
        "ratings_top_hits": {
            "top_hits": {
              "size": 10
            }
        }
      }
    }
  },
  "query": {
      "nested": {
          "inner_hits": {},
          "path": "comments",
          "query": {
              "match_all": {}
          }
      }
  }
}
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jan 13, 2026

@benwtrent I want to review this PR, and appreciate your follow up review. Here is the summary:

Currently this request fails:

POST test/_search
  {
    "query": {
      "nested": {
        "inner_hits": {},           // ← Operates at parent doc scope
        "path": "nested_item",
        "query": { "match_all": {} }
      }
    },
    "aggregations": {
      "nested_item": {
        "nested": { "path": "nested_item" },  // ← Already in nested scope
        "aggregations": {
          "01": {
            "top_hits": {}          // ← Tries to inherit inner_hits, causing conflict
          }
        }
      }
    }
  }

Why it fails:

  1. The query-level inner_hits operates at the parent document scope - it fetches nested children from parent documents
  2. The top_hits aggregation inside nested aggregation already operates at nested document scope - it's working with nested child documents
  3. When top_hits aggregation inherits the query's inner_hits context, it tries to fetch nested documents from already-nested documents (double nesting)
  4. This causes "Error retrieving path" failures because the fetch phase gets confused about document offsets and contexts

The error: The system tries to extract nested documents by offset, but the offset calculation is wrong because it's applying nested logic to documents that are already in nested context.


This PR addresses by preventing top_hits aggregation from inheriting inner_hits when inside a nested aggregation. As a result the top_hits aggregation works correctly because it's not trying to apply double-nested logic.


In our last discussion we wanted to check for second level of nestedness (If the nested field has another level of nested fields inside it, then inner_hits could be useful in top_hits). I did a check, and it is currently NOT supported in Elasticsearch : double-nestedness inner_hits don't work in top_hits aggregation, producing some random results.
I suggest leave this work for future ( I can create a follow up issue) , and merge this PR to address immediate bug. WDYT?

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. We obviously cannot allow multi-layered of nested ness.

@mayya-sharipova mayya-sharipova added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 14, 2026
@elasticsearchmachine elasticsearchmachine merged commit 60a1793 into elastic:main Jan 15, 2026
35 checks passed
@mayya-sharipova mayya-sharipova deleted the fix-nested-top-hits branch January 15, 2026 20:03
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Currently `top_hits` aggs inside nested aggs  inherit the parent 
query's inner_hits context,  causing an error "Error retrieving path".
This happenes because nested top_hits is already operating on nested
child docs,  and asking it to retrieve inner_hits adds an extra
incorrect level of nestedness.

This fix proposes to create an empty InnerHitsContext for this case.

Closes elastic#136893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search Relevance/Search Catch all for Search Relevance Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

3 participants