Skip to content

Add bootstrap check for JVM tools#2924

Merged
SylvainJuge merged 9 commits intoelastic:mainfrom
SylvainJuge:bootstrap-check-jvmtools
Jan 4, 2023
Merged

Add bootstrap check for JVM tools#2924
SylvainJuge merged 9 commits intoelastic:mainfrom
SylvainJuge:bootstrap-check-jvmtools

Conversation

@SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Dec 13, 2022

What does this PR do?

In some environments, it might be common to add the agent by using the JAVA_TOOL_OPTIONS environment variable to provide the -javaagent:... JVM flag.

One downside is that all the JVM tools that also use the JVM will also try to start the agent, hence trying to instrument them and make them slower.

While running jar or javac or javadoc is quite unlikely in production, it is quite common to have a few invocations of jps to list JVM processes or keytool to import certificates.

When testing this change with all non-interactive JDK 11 binaries in ${JAVA_HOME}/bin (thus jconsole, jshell, ... are excluded), we get a significant performance improvement measured through time :

Before:
71.21s user 2.53s system 249% cpu 29.539 total

After:
12.88s user 1.01s system 176% cpu 7.857 total

For reference, without any agent:
4.28s user 0.52s system 152% cpu 3.140 total

Checklist

  • This is a bugfix
  • manually testing with JDK 11, 8 and 7 by setting JAVA_TOOL_OPTIONS with -javaagent:... and try to start all JVM binaries in ${JAVA_HOME}/bin folder.
@ghost
Copy link

ghost commented Dec 13, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-04T10:55:43.493+0000

  • Duration: 62 min 53 sec

Test stats 🧪

Test Results
Failed 0
Passed 3398
Skipped 42
Total 3440

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@SylvainJuge SylvainJuge marked this pull request as ready for review January 3, 2023 13:12
@github-actions
Copy link

github-actions bot commented Jan 3, 2023

/test

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose additional false checks for the unit tests (currently only tested with null and empty string):

  1. a real example, for example- using System.getProperty("sun.java.command"), relying on the value of this property for the JUnit process
  2. some mock examples of expected commands, like running with some main class, a jar or whatever
Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests and error indication 🙏

Thinking about it some more and given what you found out when adding these tests, in this case we should be much more cautious with false positives, where the penalty is high, as opposed to false negatives, which only mean we do not apply this optimization.
With that in mind, two suggestions:

  1. If the bootstrap check contains errors, the agent will disable itself, so using the current bootstrap check error as is would be too aggressive (sorry about that 🙁 ). So I think - better to print to System.err and avoid disabling the agent - either directly within the check, or by adding another bootstrap-check-error-type that does not cause agent deactivation, only message print
  2. Add as many negative tests that as you can think of (I can't think of specific ones others than the ones I already proposed)

Since I don't have more useful input other than that, approving and leaving it to your judgement.

@SylvainJuge
Copy link
Member Author

I see what you mean here @eyalkoren: this is just an optimization, thus not something essential that will prevent the agent to work properly (unlike other JVM bootstrap checks). While I like the idea to "fail fast and get quick feedback", that's not really optimal for the end user but more optimized for us.

What I think would be the best in this case would be to:

  • make this check issue warnings only, and not errors, thus it won't prevent the agent to start
  • ensure that the bootstrap checks results (errors/warnings) are also displayed in System.out (or System.err) to maximize the chances the end-user will see them if they happen (I think that's already the case but I need to check).

Thus, when a false positive is triggered:

  • users are likely to spot the issue if they see it in the standard output, thus they could report to us through support/forum.
  • they have a work-around by disabling bootstrap checks

In the case of a false negative, the impact is only visible if the end-user is impacted by a slower-than-usual execution of the JVM tools. Having added tests for most of those tools, I think the probability to have false negatives is low, and would mostly happen through support.

@eyalkoren
Copy link
Contributor

I see what you mean here @eyalkoren: ...

Exactly!

Some Servlet container integration tests may have just revealed some false positives, but I only saw repeated failures, didn't really look into them

@SylvainJuge SylvainJuge enabled auto-merge (squash) January 4, 2023 10:57
@SylvainJuge SylvainJuge merged commit 38557dd into elastic:main Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants