Prevent overhead from excessive span creation#3151
Prevent overhead from excessive span creation#3151SylvainJuge merged 10 commits intoelastic:mainfrom
Conversation
| CoreConfiguration coreConfig = MockTracer.createRealTracer().getConfig(CoreConfiguration.class); | ||
| SerializationConstants.init(coreConfig); |
There was a problem hiding this comment.
[for reviewer] this was necessary as this test failed when run in isolation, most tests actually trigger this init through instrumentation init.
|
/test |
apm-agent-core/src/main/java/co/elastic/apm/agent/tracemethods/TraceMethodInstrumentation.java
Show resolved
Hide resolved
...ugins/apm-api-plugin/src/main/java/co/elastic/apm/agent/pluginapi/TracedInstrumentation.java
Show resolved
Hide resolved
| * <li>{@literal false} returned means the caller MAY call {@link #createSpan()} or {@link #createExitSpan()}</li> | ||
| * </ul> | ||
| */ | ||
| boolean shouldSkipChildSpanCreation(); |
There was a problem hiding this comment.
In my opinion it would be cleaner to instead alter the contract of createSpan() to allow to return null values for the cases where the tracer decides that no spans should be created anymore. This would allow this optimization to also take effect on our own instrumentations.
However, this would be a lot of work as we need to revisit all createSpan() usages in plugins and make sure that they properly handle null values and that there are no side effects of not creating spans, which is a bit much for this PR.
There was a problem hiding this comment.
That's exactly why I took this simple approach here, and it would have been quite consistent with createExitSpan that is expected to return null in case there is already an active exit span.
Another alternative would be to have a way to return a no-op span, which could avoid most of the refactoring but could still have some unexpected side-effects.
I think it could be worth investing a bit of time for a refactoring/tech-debt PR.
What does this PR do?
Fixes #2788 by skipping span creation when the max span limit is already reached within the active transaction.
This is applied to the following instrumentations:
trace_methodsset in configuration@Tracedannotated methods@CaptureSpanannotated methodsWhile it does not prevent users from mis-using configuration or annotations, it removes most of the agent overhead by skipping the span creation.
The global number of dropped spans is captured in the transaction, however the dropped span metrics are not updated because they rely on the span being created (and are updated when the span ends).
As a consequence the dropped spans metrics might appear inconsistent, but I think it's safe to assume that inconsistency is a small price to pay here given it helps to lower the overhead in case of an agent mis-configuration.
Checklist
Added an API method or config option? Document in which version this will be introducedN/AI have made corresponding changes to the documentationN/APerformance improvements
10 millions very short spans created within a single transaction: