Instruments
Collect measurements with standard and custom instruments
An instrument in the router collects data and reports measurements to a metric backend. Supported instruments include standard instruments from OpenTelemetry, standard instruments for the router's request lifecycle, and custom instruments. Supported instrument kinds are counters and histograms.
You can configure instruments in router.yaml with telemetry.instrumentation.instruments.
OpenTelemetry standard instruments
OpenTelemetry specifies multiple standard metric instruments that are available in the router:
In the router service:
http.server.active_requests- The number of active requests in flight.http.server.request.body.size- A histogram of request body sizes for requests handled by the router.http.server.request.duration- A histogram of request durations for requests handled by the router.
In the subgraph service:
http.client.request.body.size- A histogram of request body sizes for requests handled by subgraphs.http.client.request.duration- A histogram of request durations for requests handled by subgraphs.http.client.response.body.size- A histogram of response body sizes for requests handled by subgraphs.
For connector HTTP requests:
http.client.request.body.size- A histogram of request body sizes for connectors HTTP requests.http.client.request.duration- A histogram of request durations for connectors HTTP requests.http.client.response.body.size- A histogram of response body sizes for connectors HTTP responses.
default_requirement_level setting configures whether or not these instruments are enabled by default. Out of the box, its default value of required enables them. You must explicitly configure an instrument for different behavior.These instruments are configurable in router.yaml:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 http.server.active_requests: true # (default false)
6 http.server.request.body.size: true # (default false)
7 http.server.request.duration: true # (default false)
8 subgraph:
9 http.client.request.body.size: true # (default false)
10 http.client.request.duration: true # (default false)
11 http.client.response.body.size: true # (default false)
12 connector:
13 http.client.request.body.size: true # (default false)
14 http.client.request.duration: true # (default false)
15 http.client.response.body.size: true # (default false)They can be customized by attaching or removing attributes. See attributes to learn more about configuring attributes.
1telemetry:
2 instrumentation:
3 instruments:
4 default_requirement_level: required
5 router:
6 http.server.active_requests:
7 attributes:
8 http.request.method: true
9 subgraph:
10 http.client.request.duration:
11 attributes:
12 subgraph.name: true
13 connector:
14 http.client.request.duration:
15 attributes:
16 connector.source.name: trueApollo standard instruments
To learn about Apollo-provided standard metric instruments for the router's request lifecycle, see router instruments.
Custom instruments
You can define custom instruments on the router, supergraph, and subgraph services in the router pipeline. You can also define custom instruments for each JSON element in the response data the router returns to clients.
The example configuration below defines four custom instruments:
acme.request.durationon therouterserviceacme.graphql.requestson thesupergraphserviceacme.graphql.subgraph.errorson thesubgraphserviceacme.user.not.foundon a connector HTTP responseacme.graphql.list.lengthson each JSON element returned to the client (defined ongraphql)
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 http.server.active_requests: true
6 acme.request.duration:
7 value: duration
8 type: counter
9 unit: kb
10 description: "my description"
11 condition:
12 eq:
13 - 200
14 - response_status: code
15 attributes:
16 http.response.status_code: true
17 "my_attribute":
18 response_header: "x-my-header"
19
20 supergraph:
21 acme.graphql.requests:
22 value: unit
23 type: counter
24 unit: count
25 description: "supergraph requests"
26
27 subgraph:
28 acme.graphql.subgraph.errors:
29 value: unit
30 type: counter
31 unit: count
32 description: "my description"
33
34 connector:
35 acme.user.not.found:
36 value: unit
37 type: counter
38 unit: count
39 description: "Count of 404 responses from the user API"
40 condition:
41 all:
42 - eq:
43 - 404
44 - connector_http_response_status: code
45 - eq:
46 - "user_api"
47 - connector_source: name
48
49 graphql:
50 acme.graphql.list.lengths:
51 value:
52 list_length: value
53 type: histogram
54 unit: count
55 description: "my description"- Custom metrics, events, and attributes consume more processing resources than standard metrics. Adding too many (standard or custom) can slow your router down.
- Configurations such as
events.*.request|error|responsethat produce output for all router lifecycle services should only be used for development or debugging, not for production.
info. Set the values of RUST_LOG or APOLLO_ROUTER_LOG environment variables and the --log CLI option to info. Using less verbose logging, such as error, can cause some attributes to be dropped.Instrument naming conventions
When defining a custom instrument, make sure to reference OpenTelemetry (OTel) semantic conventions. The OTel semantic conventions help guide you to:
Choose a good name for your instrument.
See which standard attributes can be attached to your instrument.
Some particular guidelines to note:
Don't include the unit name in the metric name. For example,
size_kbshould besizeand the unit should bekb.Don't include
_totalas a suffix. For example, usehttp.server.active_requests, nothttp.server.active_requests_total.Use dot notation to separate namespaces in the metric name. For example, use
http.server.active_requests, nothttp_server_active_requests.
Instrument configuration
default_requirement_level
The default_requirement_level option sets the default attributes to attach to default standard instruments, as defined by OpenTelemetry semantic conventions.
Valid values:
required(default, Apollo recommended) - required attributes will be attached to standard instruments by default.recommended- experimental attributes from OpenTelemetry's development-status conventions will be attached to standard instruments by default.
required, rather than recommended. Using recommended includes experimental attributes from OpenTelemetry's development-status GraphQL semantic conventions, such as graphql.document and subgraph.graphql.document. These attributes can create high cardinality and may contain sensitive information. See the standard attributes documentation for details.1telemetry:
2 instrumentation:
3 instruments:
4 # Set the default requirement level
5 default_requirement_level: requiredAttributes can be configured individually, so that required attributes can be overridden or disabled. For example, http.response.status_code is set individually to override the standard value:
1telemetry:
2 instrumentation:
3 instruments:
4 # Set the default requirement level
5 default_requirement_level: required
6 router:
7 # Standard metrics
8 http.server.request.body.size:
9 attributes:
10 # Standard attributes
11 http.response.status_code: false
12 # Custom attribute
13 "acme.my_attribute":
14 response_header: "x-my-header"
15 # Standard metrics
16 http.server.active_requests:
17 attributes:
18 # Standard attributes, different than other ones provides in standard metrics, custom attributes are not available on this standard metric
19 http.request.method: false
20 server.address: true
21 server.port: true
22 url.scheme: trueopt-in must be configured individually.Router request lifecycle services
A router's request lifecycle has three major services that support instrumentation:
Router service - Operates within the context of an HTTP server, handling the opaque bytes of an incoming HTTP request. Does query analysis to parse the GraphQL operation and validate it against schema.
Supergraph service - Handles a GraphQL request after it's been parsed and validated, and before it's sent to subgraphs. Runs the query planner to produce a query plan to execute.
Subgraph service - Handles GraphQL subgraph requests that have been executed as part of a query plan. Creates HTTP client requests to subgraphs.
Additionally, you can define instruments on graphql for each JSON element returned to the client.
To define a custom instrument, add a new key to router.yaml as telemetry.instruments.<service>.<custom-instrument>. For example, add a custom instrument acme.request.duration:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.request.duration: # The name of your custom instrument/metric
6 value: duration
7 type: counter
8 unit: s
9 description: "my description"value
The service you define an instrument on determines its possible values.
| Value | Definition | Available services |
|---|---|---|
| The duration of the pipeline service. | router, supergraph, subgraph |
| The number of times the pipeline service has been executed. | router, supergraph, subgraph, graphql |
| A custom value extracted from the pipeline service. See selectors for more information. | router, supergraph, subgraph, graphql |
| The duration of an event in the pipeline service. | supergraph |
| The number of times an event in the pipeline service has been executed. | supergraph |
| A custom value extracted from the event in the pipeline service. See selectors for more information. | supergraph |
event_* are mandantory when you want to use a selector on the supergraph response body (response_data and response_errors).Values of custom metrics can be extracted from the pipeline using custom attributes. For example, to sum the contents of a request header, create a counter with value set as the request header:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.metric:
6 # ...
7 type: counter
8 value:
9 request_header: "x-my-header"type
Instruments come in two different types:
counter- A monotonic counter. For example, requests served, tasks completed, or errors occurred.histogram- A histogram of values. For example, request durations or response body sizes.
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.metric:
6 # ...
7 type: counter # counter, histogramunit
A free format unit that is displayed in your APM.
A unit is recommended to use SI units and definitions from The Unified Code for Units of Measure.
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.metric:
6 # ...
7 unit: s # secondsDuration unit conversion
For instruments with value: duration or value: event_duration, the router automatically converts the measured duration to match the configured unit. This feature supports the following time units:
s- seconds (default, recommended)ms- millisecondsus- microsecondsns- nanoseconds
s) as the unit for duration measurements.Only use non-second units (ms, us, ns) when integrating with an observability platform that requires a specific unit and doesn't automatically convert time units.Example:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 # Recommended: use seconds (values recorded as seconds)
6 acme.request.duration:
7 value: duration
8 type: histogram
9 unit: s
10 description: "Request Duration (s)"
11
12 # Only if required by your observability platform
13 otheracme.request.duration:
14 value: duration
15 type: histogram
16 unit: ms # Values automatically converted to milliseconds
17 description: "Request Duration (ms)"description
A free format description of the instrument that will be displayed in your APM.
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.metric:
6 # ...
7 description: "my description"condition
You may only want to mutate an instrument under certain conditions. For example, you may only want to increment a counter if the response status code is 200.
To do this use a condition:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 acme.metric:
6 # ...
7 condition:
8 eq:
9 - 200
10 - response_status: codeattributes
Instruments may have attributes attached to them from the router pipeline. These attributes are used to filter and group metrics in your APM.
Attributes may be drawn from standard attributes or selectors except for the standard metric http.server.active_requests.
The attributes available depend on the service of the pipeline.
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 # Standard metrics
6 http.server.request.body.size:
7 attributes:
8 # Standard attributes
9 http.response.status_code: false
10 # Custom attribute
11 "acme.my_attribute":
12 response_header: "x-my-header"
13 # Standard metrics
14 http.server.active_requests:
15 attributes:
16 # Standard attributes, different than other ones provides in standard metrics, custom attributes are not available on this standard metric
17 http.request.method: false
18 server.address: true
19 server.port: true
20 url.scheme: true
21 # Custom metric
22 acme.metric:
23 value: duration
24 type: counter
25 unit: s
26 description: "my description"
27 attributes:
28 http.response.status_code: true
29 "my_attribute":
30 # ...
31 response_header: "x-my-header"
32 subgraph:
33 requests.timeout:
34 value: unit
35 type: counter
36 unit: request
37 description: "subgraph requests containing subgraph timeout"
38 attributes:
39 subgraph.name: true
40 condition:
41 eq:
42 - "request timed out"
43 - error: reason
44
45 graphql:
46 acme.graphql.list.lengths:
47 value:
48 list_length: value
49 type: histogram
50 unit: count
51 description: "my description"
52 attributes:
53 graphql.type.name: trueInstrument configuration reference
| Option | Values | Default | Description |
|---|---|---|---|
<attribute-name> | The name of the custom attribute. | ||
<instrument-name> | The name of the custom instrument. | ||
attributes | standard attributes or selectors | The attributes of the custom instrument. | |
condition | conditions | The condition for mutating the instrument. | |
default_requirement_level | required | recommended | required | The default attribute requirement level. |
type | counter | histogram | The name of the custom instrument. | |
unit | A unit name, for example By or {request}. | ||
description | The description of the custom instrument. | ||
value | unit | duration | <custom> | event_unit | event_duration | event_custom | The value of the instrument. |
Production instrumentation example
At minimum, observability of a router running in production requires knowing about errors that arise from operations and subgraphs.
The example configuration below adds instruments with both standard OpenTelemetry attributes and custom attributes to extract information about erring operations:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 http.server.request.duration:
6 # Adding subgraph name, response status code from the router and the operation name
7 attributes:
8 http.response.status_code: true
9 graphql.operation.name:
10 operation_name: string
11 # This attribute will be set to true if the response contains graphql errors
12 graphql.errors:
13 on_graphql_error: true
14 http.server.response.body.size:
15 attributes:
16 graphql.operation.name:
17 operation_name: string
18 subgraph:
19 # Adding subgraph name, response status code from the subgraph and original operation name from the supergraph
20 http.client.request.duration:
21 attributes:
22 subgraph.name: true
23 http.response.status_code:
24 subgraph_response_status: code
25 graphql.operation.name:
26 supergraph_operation_name: string
27 # This attribute will be set to true if the response contains graphql errors
28 graphql.errors:
29 subgraph_on_graphql_error: true
30 http.client.request.body.size:
31 attributes:
32 subgraph.name: true