Skip to content

Propagate status codes from shard failures appropriately#118016

Merged
pawankartik-elastic merged 19 commits intoelastic:mainfrom
pawankartik-elastic:pkar/node-not-conn-502
Dec 23, 2024
Merged

Propagate status codes from shard failures appropriately#118016
pawankartik-elastic merged 19 commits intoelastic:mainfrom
pawankartik-elastic:pkar/node-not-conn-502

Conversation

@pawankartik-elastic
Copy link
Contributor

@pawankartik-elastic pawankartik-elastic commented Dec 4, 2024

If the underlying error caused is NodeNotConnectedException and the reason is connection already closed, the status code that should be returned is 502 rather than 500. This allows Kibana to retry the search op.

Edit: the scope of this PR itself has changed since I first worked on it. The goal of this PR is to let the status code propagate so that it is returned back to the user appropriately. There's a separate task to map out Exception-s to 502; hence I'll be removing the related code form this PR. Adding this edit to let the readers understand how the scope has evolved.

Relates ES-10191.
Fixes #118482.

…dException` and undo `status()` override in `NodeDisconnectedException`
Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinions about the implementation of SearchPhaseExecutionException#status(), this is something that the search-foundations team should review.

We shouldn't be changing the ConnectTransportException subclasses here.

@pawankartik-elastic pawankartik-elastic changed the title Return 502 if the underlying error is NodeNotConnectedException Dec 9, 2024
4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.
In reference to the previous commit: this case is no longer needed.
@pawankartik-elastic pawankartik-elastic added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations labels Dec 19, 2024
Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left two comments and questions, thanks!

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @pawankartik-elastic !

@elasticsearchmachine
Copy link
Collaborator

Hi @pawankartik-elastic, I've created a changelog YAML for you.

@pawankartik-elastic pawankartik-elastic added the auto-backport Automatically create backport pull requests when merged label Dec 19, 2024
@pawankartik-elastic pawankartik-elastic dismissed DaveCTurner’s stale review December 19, 2024 21:34

Changes have been addressed and were reviewed by a member of the Search team.

@pawankartik-elastic pawankartik-elastic marked this pull request as ready for review December 23, 2024 08:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@pawankartik-elastic pawankartik-elastic merged commit 22990df into elastic:main Dec 23, 2024
pawankartik-elastic added a commit to pawankartik-elastic/elasticsearch that referenced this pull request Dec 23, 2024
)

* Return 502 if the underlying error is `NodeNotConnectedException`

* Traverse through the cause stack trace and check for `NodeNotConnectedException` and undo `status()` override in `NodeDisconnectedException`

* Rewrite `while` condition

* Fix: precommit

* Let status codes propagate rather than walking the stacktrace explicitly

Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.

* Fix: unused import

* Fix null deref

* Do not map descendants of `ConnectTransportException` to `502`

* Fix: precommit

* Do not account for 4xx status codes

4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.

* Remove unnecessary `switch` case

In reference to the previous commit: this case is no longer needed.

* Rewrite code comment

* Address review comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/118016.yaml

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
elasticsearchmachine pushed a commit that referenced this pull request Dec 23, 2024
…119205)

* Return 502 if the underlying error is `NodeNotConnectedException`

* Traverse through the cause stack trace and check for `NodeNotConnectedException` and undo `status()` override in `NodeDisconnectedException`

* Rewrite `while` condition

* Fix: precommit

* Let status codes propagate rather than walking the stacktrace explicitly

Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.

* Fix: unused import

* Fix null deref

* Do not map descendants of `ConnectTransportException` to `502`

* Fix: precommit

* Do not account for 4xx status codes

4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.

* Remove unnecessary `switch` case

In reference to the previous commit: this case is no longer needed.

* Rewrite code comment

* Address review comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/118016.yaml

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
@pawankartik-elastic pawankartik-elastic deleted the pkar/node-not-conn-502 branch June 26, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.18.0 v9.0.0

4 participants