[ES|QL] Show partial results after timeout#219027
Conversation
|
@lukasolson could we do the same when the user clicks the Cancel button? (we can do it at a follow up PR too)
|
Yes, this is the plan, but it will be a follow-up PR because we need to make some changes to how the aborting works so we can distinguish between when the user cancels & when we automatically cancel (when refreshing, navigating around, config changes, etc.) |
|
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
|
Pinging @elastic/kibana-esql (Team:ESQL) |
nreese
left a comment
There was a problem hiding this comment.
kibana-presentation changes LGTM
code review only
davismcphee
left a comment
There was a problem hiding this comment.
Code changes look good and it seems to work well for the most part! Unfortunately I did run into some flakiness where sometimes the results wouldn't show as expected. It seems like there may be a timing issue going on:
timeout.mp4
| export const esqlAsyncSearchStrategyProvider = ( | ||
| searchConfig: SearchConfigSchema, | ||
| logger: Logger, | ||
| useInternalUser: boolean = false |
There was a problem hiding this comment.
Just curious why the internal user logic is no longer needed?
There was a problem hiding this comment.
This was actually never used for ES|QL, it was there as a result of a copy & paste of the regular async search strategy. See the discussion here: #174246 (comment)
It's not completely necessary to remove as part of this PR but I was fiddling around in there anyway and decided to remove it since it opens up a backdoor that could potentially cause issues (e.g. an internal engineer uses this feature and exposes an HTTP route that allows any Kibana user to run queries as the internal system user).
There was a problem hiding this comment.
Thanks for explaining, and in that case totally agree removing it makes sense 👍
Hmm interesting, I will look into this. |
That was a pretty wild one to chase down - it appears we were sometimes cancelling the ES|QL request when the HTTP request was aborted (which will happen occasionally depending on exact timing) which was causing this issue. I believe I've fixed it in 9aac375. Do you mind taking another look? |
|
|
||
| esTestCluster: { | ||
| ...baseConfig.get('esTestCluster'), | ||
| license: 'trial', |
There was a problem hiding this comment.
good catch, tested locally
davismcphee
left a comment
There was a problem hiding this comment.
Nice job tracking that issue down! I left one question, but overall this looks good to go on my end and I can no longer reproduce the flakiness 👍
Just one thing I noticed though. When running the functional test, the query seems to automatically rerun again once it times out, resulting in another search and timeout + toast. Any idea why this might be and if it's an actual issue or just test related? It doesn't seem to happen on subsequent fetches and I wasn't able to reproduce it manually, but we may want to investigate anyway to be sure before merging.
timeout.mp4
| // This abortSignal comes from getRequestAbortedSignal and fires if the HTTP request is aborted; | ||
| // in the case of these async APIs, we don't want to cancel the async request if the HTTP | ||
| // request is aborted | ||
| const { abortSignal, ...options } = searchOptions; |
There was a problem hiding this comment.
I'm not really familiar enough with how this code works to know if there are any other implications to this. Will it impact running queries being cancelled when navigating away from Discover?
There was a problem hiding this comment.
For async queries, we rely on two mechanisms for cancelling queries:
- The client-side abort logic that will actually send a
DELETE {id}request (when a new query is submitted, the "Cancel" button is clicked, navigating between applications in Kibana, etc.) - The
keep_alivethat is updated on every poll (this is an escape hatch for when we can't detect any sort of event that signals that the search request should be terminated, such as when closing the browser completely or navigating outside of Kibana)
The abortSignal that was being passed here fires when the browser disconnects the HTTP request before receiving a response:
What this means is that when the browser cancelled an HTTP request (even just a polling request), the server was then cancelling the query (even though the client would follow up with the DELETE {id} request afterward). This creates a race condition and is not the behavior we want since we need to allow the query to continue running while we make the call to get the final results before we cancel the query.
There was a problem hiding this comment.
Oh and I forgot to mention that the reason this abortSignal is even around is because, in the case of synchronous queries, we rely on this mechanism to cancel queries since we don't have any sort of ID to call the cancel APIs on.
There was a problem hiding this comment.
Awesome, thanks for the thorough explanation, much clearer to me now!
…on/kibana into esql_show_results_after_timeout
Great catch! It was the test. It turns out |
There was a problem hiding this comment.
Confirmed the double submit issue in the test is now fixed, LGTM 👍 Thanks for working on this!
Also I noticed the backport label was set to backport:prev-minor, which is 9.0, but I think we want it only in 9.1 and 8.19, right? I updated them anyway, but wanted to add a note in case.
| // This abortSignal comes from getRequestAbortedSignal and fires if the HTTP request is aborted; | ||
| // in the case of these async APIs, we don't want to cancel the async request if the HTTP | ||
| // request is aborted | ||
| const { abortSignal, ...options } = searchOptions; |
There was a problem hiding this comment.
Awesome, thanks for the thorough explanation, much clearer to me now!
💚 Build Succeeded
Metrics [docs]Async chunks
Page load bundle
History
cc @lukasolson |
|
Starting backport for target branches: 8.19 |
## Summary Partially addresses elastic#205783. When an ES|QL query times out, this PR calls the `_query/async/{id}/stop` endpoint to retrieve the results of the query instead of simply calling `GET` (which does not get the completed results). Updates the fetching in Discover to properly account for partial results in ES|QL.  ### Release notes When an ES|QL query times out (as a result of the `search:timeout` advanced setting), partial results that are available are now shown. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co> Co-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co> (cherry picked from commit 36a296e)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
# Backport This will backport the following commits from `main` to `8.19`: - [[ES|QL] Show partial results after timeout (#219027)](#219027) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Lukas Olson","email":"lukas@elastic.co"},"sourceCommit":{"committedDate":"2025-06-10T17:33:31Z","message":"[ES|QL] Show partial results after timeout (#219027)\n\n## Summary\n\nPartially addresses https://github.com/elastic/kibana/issues/205783.\n\nWhen an ES|QL query times out, this PR calls the\n`_query/async/{id}/stop` endpoint to retrieve the results of the query\ninstead of simply calling `GET` (which does not get the completed\nresults).\n\nUpdates the fetching in Discover to properly account for partial results\nin ES|QL.\n\n\n\n\n### Release notes\n\nWhen an ES|QL query times out (as a result of the `search:timeout`\nadvanced setting), partial results that are available are now shown.\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>\nCo-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>","sha":"36a296ebf0aa062a50dbf4543f487f14b6f3ce7d","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:feature","Team:DataDiscovery","Feature:ES|QL","Team:ESQL","backport:version","v9.1.0","v8.19.0"],"title":"[ES|QL] Show partial results after timeout","number":219027,"url":"https://github.com/elastic/kibana/pull/219027","mergeCommit":{"message":"[ES|QL] Show partial results after timeout (#219027)\n\n## Summary\n\nPartially addresses https://github.com/elastic/kibana/issues/205783.\n\nWhen an ES|QL query times out, this PR calls the\n`_query/async/{id}/stop` endpoint to retrieve the results of the query\ninstead of simply calling `GET` (which does not get the completed\nresults).\n\nUpdates the fetching in Discover to properly account for partial results\nin ES|QL.\n\n\n\n\n### Release notes\n\nWhen an ES|QL query times out (as a result of the `search:timeout`\nadvanced setting), partial results that are available are now shown.\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>\nCo-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>","sha":"36a296ebf0aa062a50dbf4543f487f14b6f3ce7d"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/219027","number":219027,"mergeCommit":{"message":"[ES|QL] Show partial results after timeout (#219027)\n\n## Summary\n\nPartially addresses https://github.com/elastic/kibana/issues/205783.\n\nWhen an ES|QL query times out, this PR calls the\n`_query/async/{id}/stop` endpoint to retrieve the results of the query\ninstead of simply calling `GET` (which does not get the completed\nresults).\n\nUpdates the fetching in Discover to properly account for partial results\nin ES|QL.\n\n\n\n\n### Release notes\n\nWhen an ES|QL query times out (as a result of the `search:timeout`\nadvanced setting), partial results that are available are now shown.\n\n### Checklist\n\nCheck the PR satisfies following conditions. \n\nReviewers should verify this PR satisfies this list as well.\n\n- [ ] Any text added follows [EUI's writing\nguidelines](https://elastic.github.io/eui/#/guidelines/writing), uses\nsentence case text and includes [i18n\nsupport](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)\n- [ ]\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\nwas added for features that require explanation or tutorials\n- [ ] [Unit or functional\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\nwere updated or added to match the most common scenarios\n- [ ] If a plugin configuration key changed, check if it needs to be\nallowlisted in the cloud and added to the [docker\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\n- [ ] This was checked for breaking HTTP API changes, and any breaking\nchanges have been approved by the breaking-change committee. The\n`release_note:breaking` label should be applied in these situations.\n- [ ] [Flaky Test\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\nused on any tests changed\n- [ ] The PR description includes the appropriate Release Notes section,\nand the correct `release_note:*` label is applied per the\n[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\n\n### Identify risks\n\nDoes this PR introduce any risks? For example, consider risks like hard\nto test bugs, performance regression, potential of data loss.\n\nDescribe the risk, its severity, and mitigation for each identified\nrisk. Invite stakeholders and evaluate how to proceed before merging.\n\n- [ ] [See some risk\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)\n- [ ] ...\n\n---------\n\nCo-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co>\nCo-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>","sha":"36a296ebf0aa062a50dbf4543f487f14b6f3ce7d"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Lukas Olson <lukas@elastic.co> Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co> Co-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>
## Summary Partially addresses elastic#205783. When an ES|QL query times out, this PR calls the `_query/async/{id}/stop` endpoint to retrieve the results of the query instead of simply calling `GET` (which does not get the completed results). Updates the fetching in Discover to properly account for partial results in ES|QL.  ### Release notes When an ES|QL query times out (as a result of the `search:timeout` advanced setting), partial results that are available are now shown. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co> Co-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>
## Summary Partially addresses elastic#205783. When an ES|QL query times out, this PR calls the `_query/async/{id}/stop` endpoint to retrieve the results of the query instead of simply calling `GET` (which does not get the completed results). Updates the fetching in Discover to properly account for partial results in ES|QL.  ### Release notes When an ES|QL query times out (as a result of the `search:timeout` advanced setting), partial results that are available are now shown. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [ ] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ... --------- Co-authored-by: Stratoula Kalafateli <efstratia.kalafateli@elastic.co> Co-authored-by: Matthias Wilhelm <matthias.wilhelm@elastic.co>



Summary
Partially addresses #205783.
When an ES|QL query times out, this PR calls the
_query/async/{id}/stopendpoint to retrieve the results of the query instead of simply callingGET(which does not get the completed results).Updates the fetching in Discover to properly account for partial results in ES|QL.
Release notes
When an ES|QL query times out (as a result of the
search:timeoutadvanced setting), partial results that are available are now shown.Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.
release_note:breakinglabel should be applied in these situations.release_note:*label is applied per the guidelinesIdentify risks
Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss.
Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging.