Prevent overhead from excessive span creation by SylvainJuge · Pull Request #3151 · elastic/apm-agent-java

SylvainJuge · 2023-05-26T14:37:22Z

What does this PR do?

Fixes #2788 by skipping span creation when the max span limit is already reached within the active transaction.

This is applied to the following instrumentations:

trace_methods set in configuration
@Traced annotated methods
@CaptureSpan annotated methods

While it does not prevent users from mis-using configuration or annotations, it removes most of the agent overhead by skipping the span creation.

The global number of dropped spans is captured in the transaction, however the dropped span metrics are not updated because they rely on the span being created (and are updated when the span ends).

As a consequence the dropped spans metrics might appear inconsistent, but I think it's safe to assume that inconsistency is a small price to pay here given it helps to lower the overhead in case of an agent mis-configuration.

Checklist

This is an enhancement of existing features, or a new feature in existing plugins
- I have updated CHANGELOG.asciidoc
- I have added tests that prove my fix is effective or that my feature works
- ~~Added an API method or config option? Document in which version this will be introduced~~ N/A
- ~~I have made corresponding changes to the documentation~~ N/A

Performance improvements

10 millions very short spans created within a single transaction:

the bottom half transactions with current version of the agent: transaction takes about ~7seconds
the top half is with this change applied: transaction takes about 200~400ms

ghost · 2023-05-26T14:52:37Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-05-31T11:51:50.281+0000
Duration: 15 min 30 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
run benchmark tests : Run the benchmark tests.
run jdk compatibility tests : Run the JDK Compatibility tests.
run integration tests : Run the Agent Integration tests.
run end-to-end tests : Run the APM-ITs.
run windows tests : Build & tests on windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

SylvainJuge · 2023-05-30T08:20:18Z

apm-agent-core/src/test/java/co/elastic/apm/agent/impl/transaction/TransactionTest.java

+        CoreConfiguration coreConfig = MockTracer.createRealTracer().getConfig(CoreConfiguration.class);
+        SerializationConstants.init(coreConfig);


[for reviewer] this was necessary as this test failed when run in isolation, most tests actually trigger this init through instrumentation init.

github-actions · 2023-05-30T08:21:32Z

/test

apm-agent-core/src/main/java/co/elastic/apm/agent/tracemethods/TraceMethodInstrumentation.java

...ugins/apm-api-plugin/src/main/java/co/elastic/apm/agent/pluginapi/TracedInstrumentation.java

JonasKunz · 2023-05-31T10:31:20Z

apm-agent-tracer/src/main/java/co/elastic/apm/agent/tracer/AbstractSpan.java

+     *     <li>{@literal false} returned means the caller MAY call {@link #createSpan()} or {@link #createExitSpan()}</li>
+     * </ul>
+     */
+    boolean shouldSkipChildSpanCreation();


In my opinion it would be cleaner to instead alter the contract of createSpan() to allow to return null values for the cases where the tracer decides that no spans should be created anymore. This would allow this optimization to also take effect on our own instrumentations.

However, this would be a lot of work as we need to revisit all createSpan() usages in plugins and make sure that they properly handle null values and that there are no side effects of not creating spans, which is a bit much for this PR.

That's exactly why I took this simple approach here, and it would have been quite consistent with createExitSpan that is expected to return null in case there is already an active exit span.

Another alternative would be to have a way to return a no-op span, which could avoid most of the refactoring but could still have some unexpected side-effects.

I think it could be worth investing a bit of time for a refactoring/tech-debt PR.

…race-methods

SylvainJuge added 6 commits May 26, 2023 15:59

allow to skip span creation

7d2b26d

test skip span creation

22b4974

enhance a bit javadoc

02052e6

implement optimization for trace_methods

7038f14

implement optimization for @CaptureSpan

ac6e143

implement optimization for @Traced

bb1a8cf

SylvainJuge self-assigned this May 26, 2023

github-actions bot added the agent-java label May 26, 2023

update changelog

d281bc5

turns out the ugly cast was useful :-)

2d78927

SylvainJuge commented May 30, 2023

View reviewed changes

SylvainJuge marked this pull request as ready for review May 30, 2023 08:21

SylvainJuge requested a review from a team May 30, 2023 08:21

JonasKunz approved these changes May 31, 2023

View reviewed changes

SylvainJuge added 2 commits May 31, 2023 13:44

Merge branch 'main' of github.com:elastic/apm-agent-java into limit-t…

1c50894

…race-methods

add missing log debug statements

3677e7b

SylvainJuge merged commit fd5e14c into elastic:main May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent overhead from excessive span creation#3151

Prevent overhead from excessive span creation#3151
SylvainJuge merged 10 commits intoelastic:mainfrom
SylvainJuge:limit-trace-methods

SylvainJuge commented May 26, 2023 •

edited

Loading

ghost commented May 26, 2023 •

edited by ghost

Loading

Build stats

SylvainJuge May 30, 2023

github-actions bot commented May 30, 2023

Uh oh!

Uh oh!

JonasKunz May 31, 2023

SylvainJuge May 31, 2023

Labels

2 participants

		CoreConfiguration coreConfig = MockTracer.createRealTracer().getConfig(CoreConfiguration.class);
		SerializationConstants.init(coreConfig);

Conversation

SylvainJuge commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist

Performance improvements

ghost commented May 26, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

❕ Flaky test report

🤖 GitHub comments

SylvainJuge May 30, 2023

Choose a reason for hiding this comment

github-actions bot commented May 30, 2023

Uh oh!

Uh oh!

JonasKunz May 31, 2023

Choose a reason for hiding this comment

SylvainJuge May 31, 2023

Choose a reason for hiding this comment

Labels

2 participants

SylvainJuge commented May 26, 2023 •

edited

Loading

ghost commented May 26, 2023 •

edited by ghost

Loading