Prevent data nodes from sending stack traces to coordinator when error_trace=false#118266
Conversation
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
|
Hi @piergm, I've created a changelog YAML for you. |
…ExecutionException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
javanna
left a comment
There was a problem hiding this comment.
Thanks for working on this Matteo, I left some comments
server/src/main/java/org/elasticsearch/rest/action/search/RestMultiSearchAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rest/action/search/RestMultiSearchAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java
Outdated
Show resolved
Hide resolved
...rch/src/internalClusterTest/java/org/elasticsearch/xpack/search/AsyncSearchErrorTraceIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
…ExecutionException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/ActionModule.java
Outdated
Show resolved
Hide resolved
…nException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
…nException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
…nException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
…nException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
javanna
left a comment
There was a problem hiding this comment.
Left a couple of comments and questions, LGTM otherwise
server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
...rch/src/internalClusterTest/java/org/elasticsearch/xpack/search/AsyncSearchErrorTraceIT.java
Show resolved
Hide resolved
| request.addParameter("error_trace", "true"); | ||
| while (responseEntity.get("is_running") instanceof Boolean isRunning && isRunning) { | ||
| responseEntity = performRequestAndGetResponseEntity(request); | ||
| } |
There was a problem hiding this comment.
Do you have a sense for how many times get async search ends up being called again here because it's running? Wondering because you do it in a tight loop without any backoff or sleep between calls.
There was a problem hiding this comment.
Very few times, if any, but I agree we should have a 1s sleep between calls and lowered "wait_for_completion_timeout" to 0ms
There was a problem hiding this comment.
I did not mean to imply that sleep is a good solution, I wonder if the sleep ends up slowing down the test and how much. We can maybe finetune this as a followup
There was a problem hiding this comment.
From my local tests I saw that with 0ms of wait_for_completion_timeout and 1 second sleep we always get back the response with the first GET /_async_search. So it should not slow down the test too much IMO. But I am open to finetune it 😄
...rch/src/internalClusterTest/java/org/elasticsearch/xpack/search/AsyncSearchErrorTraceIT.java
Show resolved
Hide resolved
…nException-if-REST_EXCEPTION_SKIP_STACK_TRACE=false
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…or_trace=false` (elastic#118266) * first iterations * added tests * Update docs/changelog/118266.yaml * constant for error_trace and typos * centralized putHeader * moved threadContext to parent class * uses NodeClient.threadpool * updated async tests to retrieve final result * moved test to avoid starting up a node * added transport version to avoid sending useless bytes * more async tests (cherry picked from commit 97bc291) # Conflicts: # server/src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java
…or_trace=false` (#118266) (#118969) * first iterations * added tests * Update docs/changelog/118266.yaml * constant for error_trace and typos * centralized putHeader * moved threadContext to parent class * uses NodeClient.threadpool * updated async tests to retrieve final result * moved test to avoid starting up a node * added transport version to avoid sending useless bytes * more async tests (cherry picked from commit 97bc291) # Conflicts: # server/src/main/java/org/elasticsearch/rest/action/search/RestSearchAction.java
|
heya @piergm do you need to backport this or just remove the backport_pending label? |
|
Just to remove the backport_pending label. Thanks for the ping, removed. |
…125732) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
…lastic#125732) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (elastic#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false. (cherry picked from commit 9f6eb1d) # Conflicts: # qa/smoke-test-http/src/internalClusterTest/java/org/elasticsearch/http/SearchErrorTraceIT.java # server/src/main/java/org/elasticsearch/search/SearchService.java # test/framework/src/main/java/org/elasticsearch/search/ErrorTraceHelper.java # x-pack/plugin/async-search/src/internalClusterTest/java/org/elasticsearch/xpack/search/AsyncSearchErrorTraceIT.java
…lastic#125732) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (elastic#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false. (cherry picked from commit 9f6eb1d) # Conflicts: # qa/smoke-test-http/src/internalClusterTest/java/org/elasticsearch/http/SearchErrorTraceIT.java # server/src/main/java/org/elasticsearch/search/SearchService.java # test/framework/src/main/java/org/elasticsearch/search/ErrorTraceHelper.java # x-pack/plugin/async-search/src/internalClusterTest/java/org/elasticsearch/xpack/search/AsyncSearchErrorTraceIT.java
…nsport (#125732) (#126246) * Log stack traces on data nodes before they are cleared for transport (#125732) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false. (cherry picked from commit 9f6eb1d)
…sport (#125732) (#126245) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false. (cherry picked from commit 9f6eb1d)
…sport (#125732) (#126243) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false. (cherry picked from commit 9f6eb1d)
…lastic#125732) We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (elastic#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
This PR updates the behavior of data nodes to optimize error handling during search operations. When the query parameter
error_trace=falseis specified in the request (which defaults tofalse), data nodes will no longer send stack traces to the search coordinator node.With
error_trace=false, the stack trace is already excluded from the REST response to the client. By extending this to the communication between data nodes and the coordinator node, we further reduce unnecessary data transfer and lower the memory needed to handle search requests in case of errors in the coordinating node.To implement this, the
error_tracequery parameter is passed to data nodes via the transport request headererror_trace. This ensures consistent handling of theerror_traceflag throughout the search request lifecycle.After this change the
error_traceheader will be always sent to data nodes. Nodes with an older version will therefore have no way to specify if we want or not stack trace, therefore upon reading the header in the data node, if not specified we default to true, mimicking the current behaviour.closes: #116772