Add relevant attributes to search took time APM metrics#134232
Add relevant attributes to search took time APM metrics#134232javanna merged 17 commits intoelastic:mainfrom
Conversation
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
|
Hi @javanna, I've created a changelog YAML for you. |
server/src/test/java/org/elasticsearch/action/search/SearchRequestIntrospectorTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
tteofili
left a comment
There was a problem hiding this comment.
nice improvement, thx Luca!
andreidan
left a comment
There was a problem hiding this comment.
Thanks for working on this Luca.
Left a few minor comments
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
| executeRequest( | ||
| (SearchTask) task, | ||
| searchRequest, | ||
| new SearchResponseActionListener(listener, searchRequest), |
There was a problem hiding this comment.
I'm a bit concerned about extending the lifecycle of the entire search request (these could be quite large given that they encapsulate the source builder).
The reason we're doing it here is so that the search response action listener can introspect the request when the search operation completes.
Would it be a good idea to introspect the request before we execute it and only pass along the metadata that results from the introspection?
We are also rewriting the search request as part of the search execution. The rewrite process might create a new instance of the search request. Is the introspection more suitable for the rewriten instance of the one we're passing on from here?
There was a problem hiding this comment.
Thanks, great thoughts!
I planned on moving the search request introspection to an earlier place, and stop carrying the search request around in the listener. I did not do it straight-away out of laziness mostly.
We are interested in the search request on the coordinating node. There though, the indices get resolved and replaced, and we are interested in the resolved indices as opposed to the potential wildcard expressions or alias names. This is the reason why I initially carried the search request to the listener. This way, we can just delay its resolution and we are sure that it will hold resolved indices. Not doing so makes it messier because the listener is where we record the metric and we need the attributes, yet resolving attributes at listener creation is too early (we need to have indices resolution happen first). I need to find a way to make this work, but I do think it's a good investment.
Instead, when it comes to query rewrite, we want to introspect the search request before its rewrite, otherwise for instance the date range filter may get rewritten to match all or match none. Looking at the design of the rewrite framework, a new instance is returned whenever anything gets rewritten to something different compared to the object rewrite was called against. That means that no state changes in the original object, which is what we are looking for. It would get complicated otherwise to see in what cases we may see changes, depending on where shards are (is can match running on the coord node or on remote nodes?).
All in all, we need to worry about getting the replaced indices, the rest is rather simple to get from the original search request.
Thanks for making me think harder about this. I also added tests around this stuff.
There was a problem hiding this comment.
#134625 should help with getting resolved indices. I will add more tests though around indices resolution when pointing to a data stream, as well as with security enabled. Let me know if there's more scenarios that come to mind for testing.
There was a problem hiding this comment.
Turns out that testing with security enabled is very tricky (due to the need to place the test under security as well as having to introspect the recorded metrics via a test plugin. I think though that with the latest update, that's no longer required. I had to grab the indices from outside of the search request, from the resolved indices abstraction, which feels much better and feels safer with or without security enabled.
I did add a few more tests cause I had some coverage gaps around dotted indices, and wildcard as well as alias resolution.
| import org.elasticsearch.search.sort.ScriptSortBuilder; | ||
| import org.elasticsearch.test.ESTestCase; | ||
|
|
||
| public class SearchRequestIntrospectorTests extends ESTestCase { |
31808ac to
98d5a5a
Compare
) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in #134232 .
|
@andreidan this should be ready for another look. |
andreidan
left a comment
There was a problem hiding this comment.
Thanks for iterating on this Luca
This looks very good ! 🚀
We already record latency of the query and fetch phase as APM metrics. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a shard search request and stores the extracted attributes together with the shard phase latency metrics. This builds on top of elastic#134232 to use the same infra and store the same attributes for shard level latency metrics.
…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
We already record latency of the query and fetch phase as APM metrics. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a shard search request and stores the extracted attributes together with the shard phase latency metrics. This builds on top of #134232 to use the same infra and store the same attributes for shard level latency metrics.
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request:
This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.