Skip to content

Prevent overhead from excessive span creation#3151

Merged
SylvainJuge merged 10 commits intoelastic:mainfrom
SylvainJuge:limit-trace-methods
May 31, 2023
Merged

Prevent overhead from excessive span creation#3151
SylvainJuge merged 10 commits intoelastic:mainfrom
SylvainJuge:limit-trace-methods

Conversation

@SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented May 26, 2023

What does this PR do?

Fixes #2788 by skipping span creation when the max span limit is already reached within the active transaction.

This is applied to the following instrumentations:

  • trace_methods set in configuration
  • @Traced annotated methods
  • @CaptureSpan annotated methods

While it does not prevent users from mis-using configuration or annotations, it removes most of the agent overhead by skipping the span creation.

The global number of dropped spans is captured in the transaction, however the dropped span metrics are not updated because they rely on the span being created (and are updated when the span ends).

As a consequence the dropped spans metrics might appear inconsistent, but I think it's safe to assume that inconsistency is a small price to pay here given it helps to lower the overhead in case of an agent mis-configuration.

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced N/A
    • I have made corresponding changes to the documentation N/A

Performance improvements

10 millions very short spans created within a single transaction:

  • the bottom half transactions with current version of the agent: transaction takes about ~7seconds
  • the top half is with this change applied: transaction takes about 200~400ms

Screenshot from 2023-05-26 16-23-32

@SylvainJuge SylvainJuge self-assigned this May 26, 2023
@ghost
Copy link

ghost commented May 26, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview previewSnapshots

Expand to view the summary

Build stats

  • Start Time: 2023-05-31T11:51:50.281+0000

  • Duration: 15 min 30 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Comment on lines +51 to +52
CoreConfiguration coreConfig = MockTracer.createRealTracer().getConfig(CoreConfiguration.class);
SerializationConstants.init(coreConfig);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] this was necessary as this test failed when run in isolation, most tests actually trigger this init through instrumentation init.

@SylvainJuge SylvainJuge marked this pull request as ready for review May 30, 2023 08:21
@SylvainJuge SylvainJuge requested a review from a team May 30, 2023 08:21
@github-actions
Copy link

/test

* <li>{@literal false} returned means the caller MAY call {@link #createSpan()} or {@link #createExitSpan()}</li>
* </ul>
*/
boolean shouldSkipChildSpanCreation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it would be cleaner to instead alter the contract of createSpan() to allow to return null values for the cases where the tracer decides that no spans should be created anymore. This would allow this optimization to also take effect on our own instrumentations.

However, this would be a lot of work as we need to revisit all createSpan() usages in plugins and make sure that they properly handle null values and that there are no side effects of not creating spans, which is a bit much for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly why I took this simple approach here, and it would have been quite consistent with createExitSpan that is expected to return null in case there is already an active exit span.

Another alternative would be to have a way to return a no-op span, which could avoid most of the refactoring but could still have some unexpected side-effects.

I think it could be worth investing a bit of time for a refactoring/tech-debt PR.

@SylvainJuge SylvainJuge merged commit fd5e14c into elastic:main May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants