[apm-data] Set event.dataset if empty for logs#129074
[apm-data] Set event.dataset if empty for logs#129074carsonip merged 16 commits intoelastic:mainfrom
Conversation
011910c to
71b8459
Compare
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @carsonip, I've created a changelog YAML for you. |
Testing notesTested with adding the processor to default pipeline. Output: {
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
"_id": "_WCQVZcBbmWL1cSKlTSi",
"_score": 1,
"_source": {
"@timestamp": 1,
"data_stream": {
"namespace": "bar",
"type": "logs",
"dataset": "apm.app.foo"
},
"event": {
"ingested": "2025-06-09T16:40:28Z",
"dataset": "apm.app.foo"
}
}
},
{
"_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
"_id": "_mCQVZcBbmWL1cSKoTRm",
"_score": 1,
"_source": {
"@timestamp": 1,
"data_stream": {
"namespace": "bar",
"type": "logs",
"dataset": "apm.app.foo"
},
"event": {
"ingested": "2025-06-09T16:40:31Z",
"dataset": "ds"
}
}
}
]
}
} |
3a8135a to
7add32d
Compare
axw
left a comment
There was a problem hiding this comment.
Looks good, just a question about sending docs to logs-apm.* without explicitly setting data_stream.dataset. Not sure if that's a thing we need to cater for.
| - set: | ||
| if: "ctx.data_stream?.dataset != null" | ||
| field: event.dataset | ||
| value: "{{{data_stream.dataset}}}" |
There was a problem hiding this comment.
Nit: should we use copy_from and ignore_empty_value, and remove the if and value?
(See https://www.elastic.co/docs/reference/enrich-processor/set-processor)
There was a problem hiding this comment.
Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source? If that's the case, I'm not sure they will be in ctx.*, and we may need to have a fallback.
Can you add test cases for that?
There was a problem hiding this comment.
Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source?
No, at least not visible to the log ingest pipeline.
If that's the case, I'm not sure they will be in ctx.*
Correct, they will not be in ctx.
we may need to have a fallback.
Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue. Unless the user explicitly remove the DS fields in an intermediate ingest pipeline, DS fields should always be present in docs from apm-server (even if reroute processor is used).
In any case, I've added test in 957d9ef to confirm this behavior.
There was a problem hiding this comment.
Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue.
I'm OK with this. I don't think it's a real thing we need to handle, even if it's technically possible.
|
Is there a reason why you switched away from Can this be considered a breaking change in behavior? |
The new |
|
Ahh okay, that makes sense. I missed the call out to the original |
axw
left a comment
There was a problem hiding this comment.
LGTM, thanks, and sorry for the delay
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
|
Tested manually with steps outlined in #129074 (comment) With version 9.0.3With version 9.1.0By inspecting the outputs above, version |
For APM logs, set
event.datasettodata_stream.datasetifevent.datasetis empty, to satisfy Anomaly Detection's requirement to haveevent.datasetin everylogs-*data stream.Fixes elastic/apm-server#16397