Add bootstrap check for JVM tools#2924
Conversation
|
/test |
eyalkoren
left a comment
There was a problem hiding this comment.
I propose additional false checks for the unit tests (currently only tested with null and empty string):
- a real example, for example- using
System.getProperty("sun.java.command"), relying on the value of this property for the JUnit process - some mock examples of expected commands, like running with some main class, a jar or whatever
elastic-apm-agent/src/main/java/co/elastic/apm/agent/premain/JvmToolBootstrapCheck.java
Show resolved
Hide resolved
eyalkoren
left a comment
There was a problem hiding this comment.
Thanks for adding the tests and error indication 🙏
Thinking about it some more and given what you found out when adding these tests, in this case we should be much more cautious with false positives, where the penalty is high, as opposed to false negatives, which only mean we do not apply this optimization.
With that in mind, two suggestions:
- If the bootstrap check contains errors, the agent will disable itself, so using the current bootstrap check error as is would be too aggressive (sorry about that 🙁 ). So I think - better to print to
System.errand avoid disabling the agent - either directly within the check, or by adding another bootstrap-check-error-type that does not cause agent deactivation, only message print - Add as many negative tests that as you can think of (I can't think of specific ones others than the ones I already proposed)
Since I don't have more useful input other than that, approving and leaving it to your judgement.
|
I see what you mean here @eyalkoren: this is just an optimization, thus not something essential that will prevent the agent to work properly (unlike other JVM bootstrap checks). While I like the idea to "fail fast and get quick feedback", that's not really optimal for the end user but more optimized for us. What I think would be the best in this case would be to:
Thus, when a false positive is triggered:
In the case of a false negative, the impact is only visible if the end-user is impacted by a slower-than-usual execution of the JVM tools. Having added tests for most of those tools, I think the probability to have false negatives is low, and would mostly happen through support. |
Exactly! Some Servlet container integration tests may have just revealed some false positives, but I only saw repeated failures, didn't really look into them |
What does this PR do?
In some environments, it might be common to add the agent by using the
JAVA_TOOL_OPTIONSenvironment variable to provide the-javaagent:...JVM flag.One downside is that all the JVM tools that also use the JVM will also try to start the agent, hence trying to instrument them and make them slower.
While running
jarorjavacorjavadocis quite unlikely in production, it is quite common to have a few invocations ofjpsto list JVM processes orkeytoolto import certificates.When testing this change with all non-interactive JDK 11 binaries in
${JAVA_HOME}/bin(thusjconsole,jshell, ... are excluded), we get a significant performance improvement measured throughtime:Before:
71.21s user 2.53s system 249% cpu 29.539 totalAfter:
12.88s user 1.01s system 176% cpu 7.857 totalFor reference, without any agent:
4.28s user 0.52s system 152% cpu 3.140 totalChecklist
JAVA_TOOL_OPTIONSwith-javaagent:...and try to start all JVM binaries in${JAVA_HOME}/binfolder.