Skip to content

[Fleet] only add time_series_metric if tsdb enabled#171712

Merged
juliaElastic merged 4 commits intoelastic:mainfrom
juliaElastic:fix-apm-runtime-shadow
Nov 23, 2023
Merged

[Fleet] only add time_series_metric if tsdb enabled#171712
juliaElastic merged 4 commits intoelastic:mainfrom
juliaElastic:fix-apm-runtime-shadow

Conversation

@juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Nov 22, 2023

Summary

Only set time_series_metric and time_series_dimension in data stream mappings if tsdb is enabled.

This fixes an issue with apm package in 8.11.

Steps to reproduce the issue:

  1. install apm-8.10.4 (create a 8.10 cluster or upload zip)
  2. index a document that has jvm fields, so that dynamic mappings are created
  3. upgrade package to apm-8.11.0
  4. bug: can't update mappings with the error mapper_parsing_exception: Field [jvm.memory.non_heap.pool.committed] attempted to shadow a time_series_metric]
  5. expected with the fix: the apm package upgrade succeeds, time_series_metrics is not needed if tsdb is not enabled.

This is happening because apm introduced the mapping of jvm fields in 8.11, so clusters that ingested jvm data in apm 8.10 had those fields created as runtime fields. When mappings were updated in 8.11 with jvm fields and time_series_metric, elasticsearch gave that error with the shadowing, probably because the write index had the runtime mappings.

This fix is a change to conditionally add time_series_metric, it helps with apm because they don't use tsdb.
The same issue can technically happen with other packages if they have runtime fields and add mappings on those in a new version, and they use tsdb.
Maybe this should be fixed on elasticsearch side.
An alternative solution would be to do a rollover first (so that runtime fields disappear from write index), and then do the mapping update, however this wouldn't work in all cases as there is a race condition - there could be documents indexed after the rollover before the mapping update.

// install apm 8.10.4 - downloaded from kibana 8.10.4 bundled packages
curl -XPOST -H 'content-type: application/zip' -H 'kbn-xsrf: true' http://localhost:5601/julia/api/fleet/epm/packages -u elastic:changeme --data-binary @apm-8.10.4.zip

// index doc
POST metrics-apm.internal-default/_doc
{"metricset":{"name":"app","samples":[{"name":"jvm.memory.non_heap.pool.committed","value":3.407872e+06},{"value":1.073741824e+09,"name":"jvm.memory.non_heap.pool.max"},{"name":"jvm.memory.non_heap.pool.used","value":3.12092e+06}]},"process":{"parent":{"pid":24713},"pid":24715,"title":"/usr/lib/jvm/java-11-openjdk-amd64/bin/java"},"@timestamp":"2023-11-21T16:04:51.071Z","data_stream":{"type":"metrics","dataset":"apm.internal","namespace":"default"},"host":{"os":{"platform":"Linux"},"architecture":"amd64","hostname":"carson-elastic","ip":["127.0.0.1"],"name":"carson-elastic"},"service":{"runtime":{"name":"Java","version":"11.0.20.1"},"language":{"name":"Java","version":"11.0.20.1"},"name":"hello_world","node":{"name":"carson-elastic"}},"agent":{"activation_method":"javaagent-flag","ephemeral_id":"cd44472e-a95d-402b-8c68-ccfa7e37ff93","name":"java","version":"1.43.1-SNAPSHOT.3e2ec51"},"labels":{"name":"Compressed Class Space"},"observer":{"hostname":"carson-elastic","type":"apm-server","version":"8.10.4"}}

// check runtime fields in mappings
GET metrics-apm.internal-default/_mapping

{...
      "runtime": {
        "jvm.memory.non_heap.pool.committed": {
          "type": "double"
        },
        "jvm.memory.non_heap.pool.max": {
          "type": "double"
        },
        "jvm.memory.non_heap.pool.used": {
          "type": "double"
        }
      },
...}

// upgrade to apm 8.11
curl -XPOST -H 'content-type: application/zip' -H 'kbn-xsrf: true' http://localhost:5601/julia/api/fleet/epm/packages -u elastic:changeme --data-binary @apm-8.11.0.zip

Checklist

@juliaElastic juliaElastic self-assigned this Nov 22, 2023
@juliaElastic juliaElastic requested a review from a team as a code owner November 22, 2023 08:42
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 22, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ghost
Copy link

ghost commented Nov 22, 2023

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@juliaElastic
Copy link
Contributor Author

@martijnvg Hey, any idea how could this error be prevented on elasticsearch side?
Related ES validation: elastic/elasticsearch#79757

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm at a high level

@juliaElastic
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

@nchaulet nchaulet self-requested a review November 22, 2023 13:11
@nchaulet
Copy link
Member

It is not a change that should be fixed at the ES level, looks like we explicitly did the change to no conditionally render time_series_metrics #157047 ?

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, but if it's the direction we want to go code LGTM 🚀

@kibana-ci
Copy link

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] x-pack/test_serverless/functional/test_suites/observability/config.ts / serverless observability UI Rules list should enable all selection

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic
Copy link
Contributor Author

It is not a change that should be fixed at the ES level, looks like we explicitly did the change to no conditionally render time_series_metrics #157047 ?

Yes, when we removed the condition it was based on the assumption that setting time_series_metrics without tsdb is a noop. However this doesn't seem the case for apm, so I think it makes sense to add the condition back.
cc @kpollich

@whyyouwannaknow
Copy link

whyyouwannaknow commented Nov 22, 2023

Hello,

We are facing this error after updating our cluster to 8.11.1
We have successfully upgraded our Elastic Agent to this version, but we cannot upgrade our APM integration from 8.10.4 to 8.11.1, we get the following errors in the kibana.log :

[2023-11-22T14:44:48.404+01:00][ERROR][plugins.fleet] An error occurred executing "packagePolicyUpdate" callback: Error: security_exception
        Root causes:
                security_exception: action [indices:data/read/search] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [default-metrics-apm], this action is granted by the index privileges [read,all]
[2023-11-22T14:44:48.404+01:00][ERROR][plugins.fleet] Error: security_exception
        Root causes:
                security_exception: action [indices:data/read/search] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [default-metrics-apm], this action is granted by the index privileges [read,all]
    at /usr/share/kibana/node_modules/@kbn/observability-plugin/common/utils/unwrap_es_response.js:48:11
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at getConfigsAppliedToAgentsThroughFleet (/usr/share/kibana/node_modules/@kbn/apm-plugin/server/routes/settings/agent_configuration/get_config_applied_to_agent_through_fleet.js:39:20)
    at async Promise.all (index 1)
    at listConfigurations (/usr/share/kibana/node_modules/@kbn/apm-plugin/server/routes/settings/agent_configuration/list_configurations.js:22:62)
    at async Promise.all (index 0)
    at decoratePackagePolicyWithAgentConfigAndSourceMap (/usr/share/kibana/node_modules/@kbn/apm-plugin/server/routes/fleet/merge_package_policy_with_apm.js:24:8)
    at PackagePolicyClientImpl.runExternalCallbacks (/usr/share/kibana/node_modules/@kbn/fleet-plugin/server/services/package_policy.js:1207:26)
    at PackagePolicyClientImpl.update (/usr/share/kibana/node_modules/@kbn/fleet-plugin/server/services/package_policy.js:490:31)
    at PackagePolicyClientImpl.doUpgrade (/usr/share/kibana/node_modules/@kbn/fleet-plugin/server/services/package_policy.js:1032:5)
    at PackagePolicyClientImpl.upgrade (/usr/share/kibana/node_modules/@kbn/fleet-plugin/server/services/package_policy.js:1006:9)
    at upgradePackagePolicyHandler (/usr/share/kibana/node_modules/@kbn/fleet-plugin/server/routes/package_policy/handlers.js:422:18)
    at /usr/share/kibana/node_modules/@kbn/core-http-router-server-internal/src/versioned_router/core_versioned_route.js:106:24
    at Router.handle (/usr/share/kibana/node_modules/@kbn/core-http-router-server-internal/src/router.js:154:30)
    at handler (/usr/share/kibana/node_modules/@kbn/core-http-router-server-internal/src/router.js:113:50)
    at exports.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:281:9)
Response: {
  error: {
    root_cause: [
      {
        type: 'security_exception',
        reason: 'action [indices:data/read/search] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [default-metrics-apm], this action is granted by the index privileges [read,all]'
      }
    ],
    type: 'security_exception',
    reason: 'action [indices:data/read/search] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [default-metrics-apm], this actionis granted by the index privileges [read,all]'
  },
  status: 403
}

[2023-11-22T14:44:53.840+01:00][ERROR][plugins.fleet] '404 Not Found' error response from package registry at https://epr.elastic.co/package/apm/8.11.1/img/logo_apm.svg

We also have this error in the logs of the Elastic Agents :

{"log.level":"error","@timestamp":"2023-11-22T15:00:01.212+0100","message":"failed to index document in 'metrics-apm.internal-default' (fail_processor_exception): Document produced by APM Server v8.11.1, which is newer than the installed APM integration (v8.10.3-preview-1695284222). The APM integration must be upgraded.","component":{"binary":"apm-server","dataset":"elastic_agent.apm_server","id":"apm-default","type":"apm"},"log":{"source":"apm-default"},"log.origin":{"file.line":312,"file.name":"go-docappender@v0.2.1-0.20230829163624-c69a1cf8ce35/appender.go"},"service.name":"apm-server","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Could this be related to the issue described here?
Thank you!

@juliaElastic
Copy link
Contributor Author

Hello @whyyouwannaknow,

This looks like an authorization issue, could you raise a support ticket or a bug report in kibana repo? I think it's not the best place to discuss on this pr.

@gsoldevila
Copy link
Contributor

@elasticmachine run elasticsearch-ci/docs

1 similar comment
@juliaElastic
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

@juliaElastic juliaElastic enabled auto-merge (squash) November 23, 2023 08:11
@juliaElastic juliaElastic merged commit 7384321 into elastic:main Nov 23, 2023
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.11 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 171712

Questions ?

Please refer to the Backport tool documentation

@carsonip
Copy link
Member

✅ Tested with 8.10.4-SNAPSHOT to 8.11.2-SNAPSHOT

Steps:

  1. Start 8.10.4 snapshot deployment
  2. Send new jvm metrics
  3. Confirm that runtime field is created
  4. Upgrade to 8.11.2 snapshot
  5. Data stream is rolled over and new backing index does not have the runtime field. The concerned field is indexed and can be queried successfully.

✅ Tested with 8.10.4-SNAPSHOT to 8.11.1-SNAPSHOT to 8.11.2-SNAPSHOT

Steps:

  1. Start 8.10.4 snapshot deployment
  2. Send new jvm metrics
  3. Confirm that runtime field is created
  4. Upgrade to 8.11.1 snapshot
  5. Data stream is NOT rolled over and NO new backing index is created
  6. Upgrade to 8.11.2 snapshot
  7. Data stream is rolled over and new backing index does not have the runtime field. The concerned field is indexed and can be queried successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.11.2 v8.12.0

8 participants