Skip to content

parse db name from JDBC string#2642

Merged
SylvainJuge merged 31 commits intoelastic:mainfrom
SylvainJuge:parse-db-name
Jun 2, 2022
Merged

parse db name from JDBC string#2642
SylvainJuge merged 31 commits intoelastic:mainfrom
SylvainJuge:parse-db-name

Conversation

@SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented May 23, 2022

What does this PR do?

Fixes #1841

This would be a "potentially breaking change" as the granularity for dependencies will be higher and include the database name, the span.destination.service.resource includes the database name when it is part of the JDBC connection string.

On the implementation side, there are quite a few variants to parse and because the information we collect is not always captured together (for example host name and port), it's more convenient to have a builder.

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced
    • I have made corresponding changes to the documentation

Things to clarify during review

  • should the captured database name be case-sensitive ? For Oracle the parsing currently makes it lowercase, but other similar variables like SID are case-sensitive (related stackoverflow post).
    • answer: keep case-sensitive for database name, keep normalizing the host name

Things to discuss for follow-up issues / PRs.

  • we already normalize the host name to lowercase (which makes sense as DNS is usually not case sensitive), should we also normalize IPv6 addresses ? we can currently have ::1 [::1]
  • should we also capture the database user from JDBC url ? answer: let's do that later maybe
  • We don't capture DB metadata from "standard JDBC properties" that are passed to java.sql.DriverManager#getConnection(java.lang.String, java.util.Properties) : not discussed, but let's implement this only if required later.
  • What is the best fallback when there is a parsing error ? Should we prefer a "missing database" or a "dummy name database" ?
    • answer: better to not report any misleading db name + issue a proper error/warn in the logs..
@SylvainJuge SylvainJuge self-assigned this May 23, 2022
@SylvainJuge SylvainJuge added this to the 8.3 milestone May 23, 2022
@ghost
Copy link

ghost commented May 23, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-01T19:51:39.231+0000

  • Duration: 49 min 23 sec

Test stats 🧪

Test Results
Failed 0
Passed 2932
Skipped 22
Total 2954

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@SylvainJuge SylvainJuge added the ci:agent-integration Enables agent integration tests in build pipeline label May 24, 2022
Comment on lines +172 to +173
.withConnectionInstance(safeGetCatalog(connection))
.withConnectionUser(metaData.getUserName())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] this makes the db name from parsed URL having higher priority that the runtime value returned by getCatalog().

testUrl("jdbc:h2:file:/data/sample", "h2", "localhost", -1);
testUrl("jdbc:h2:mem:", "h2", "localhost", -1);
testUrl("jdbc:h2:mem:test_mem", "h2", "localhost", -1);
testUrl("jdbc:h2:tcp://localhost/mem:test", "h2", "localhost", -1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] using localhost here for testing is a bad idea as localhost will be the default value when unable to properly parse the url.

@SylvainJuge SylvainJuge marked this pull request as ready for review May 25, 2022 10:12
@github-actions
Copy link

/test

Copy link
Contributor

@jackshirazi jackshirazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that parsing looks like it was painful to implement! Well done

@AlexanderWert AlexanderWert modified the milestones: 8.3, 8.4 May 30, 2022
@SylvainJuge SylvainJuge enabled auto-merge (squash) June 1, 2022 15:51
@SylvainJuge SylvainJuge disabled auto-merge June 1, 2022 16:27
@SylvainJuge SylvainJuge merged commit 7435965 into elastic:main Jun 2, 2022
@SylvainJuge SylvainJuge deleted the parse-db-name branch June 2, 2022 07:09
@namannigam
Copy link

Things to discuss for follow-up issues / PRs.

  • should we also capture the database user from JDBC url ? answer: let's do that later maybe

@SylvainJuge Wanted to understand for a use case of observability which relies on a similar approach as far as I could think of.
Additionally, what if the database names across different SQL clusters is same? Do we recommend to ingest metrics to different APM servers?
Underlying users and tables can differ. Can IPs/domains of these cluster be a crucial metrics for the spans captured?

@SylvainJuge
Copy link
Member Author

The database name that is being captured is the one we get from the application perspective, so if the underlying database or the network alias (for example in docker) is identical we don't have any way to distinguish from the application side where the agent is running.

Using an opentelemetry-based agent like https://github.com/elastic/elastic-otel-java/ can help to capture more attributes on the span, however the default mapping you will see in the map and the dependencies in APM are likely to remain the same.

What you could do however is combining extra otel attributes with an ingest pipeline, or even an intermediate otel collector/processor to transform the captured data. For example, you could compute a dedicated value for db.namespace (see semconv) that should be used in the APM map.

@namannigam
Copy link

namannigam commented Jan 28, 2025

@SylvainJuge Thank you for the response. Please illustrate from the application perspective. In my case, both driver URL have different IPs, but the same database name. Is that not a primary consideration when identifying a cluster?

default.datasource.jdbc-url=jdbc:mysql://20.20.20.20:3306/commonDatabaseName?user=ommon

and

default.datasource.jdbc-url=jdbc:mysql://10.10.10.10:3306/commonDatabaseName?user=common
@SylvainJuge
Copy link
Member Author

In this case the URL is different and should have a distinct network host, as we can see in the unit tests that relate to this JDBC url parsing here : https://github.com/elastic/apm-agent-java/blob/main/apm-agent-plugins/apm-jdbc-plugin/src/test/java/co/elastic/apm/agent/jdbc/helper/ConnectionMetaDataTest.java

However, the host part is not used when building the map and the dependencies, only the database name, hence what you see on the UI side.

What I am suggesting here is that maybe using an opentelemetry agent can help here as it would capture similar attributes (which are defined in otel semantic conventions), and because otel makes it quite easy to modify the data through a custom collector/processor then you could easily transform the attribute that currently contains the database name to include the host.

As an alternative without switching to OTel, using ingest-pipelines with APM could also be done, but this is trickier to implement as you will have to modify the traces and the metrics for this to work properly, so I would recommend trying it with OTel first.

@namannigam
Copy link

@SylvainJuge Thank you again for your response. Though OTel doesn't seem straightforward for me to try out as this might need integration changes across applications, I would try out the route via ingest-pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-java ci:agent-integration Enables agent integration tests in build pipeline

4 participants