Skip to content

Fetch the tracked alerts without depending on the task state#235253

Merged
ersin-erdal merged 14 commits intoelastic:mainfrom
ersin-erdal:190376-version-conflict
Sep 22, 2025
Merged

Fetch the tracked alerts without depending on the task state#235253
ersin-erdal merged 14 commits intoelastic:mainfrom
ersin-erdal:190376-version-conflict

Conversation

@ersin-erdal
Copy link
Contributor

@ersin-erdal ersin-erdal commented Sep 16, 2025

Resolves: #190376

Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error.

This PR solves this problem by fetching the tracked alerts without depending on the task state.

The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under inner_hits

To verify:

  1. Create an always firing Elasticsearch Query rule with 1 hour run interval.
  2. Let the rule run and create an alert.
  3. Apply the below diff
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0d5f5..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
  1. Wait for Kibana to restart
  2. Run the rule on the UI by using "Run rule"
  3. Observe the error message on the terminal
  4. Remove the above change and wait for Kibana to restart.
  5. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task state.

The same scenario should be failing on main.

@ersin-erdal ersin-erdal added backport:skip This PR does not require backporting Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v9.2.0 labels Sep 16, 2025
@ersin-erdal ersin-erdal changed the title Fetch the tracked alerts without using anything from the task state Sep 17, 2025
@ersin-erdal ersin-erdal marked this pull request as ready for review September 17, 2025 13:41
@ersin-erdal ersin-erdal requested a review from a team as a code owner September 17, 2025 13:41
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

const result = await this.search({
size: (opts.maxAlerts || DEFAULT_MAX_ALERTS) * 2,
seq_no_primary_term: true,
size: opts.flappingSettings.lookBackWindow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need the last 20 executions worth of alerts? Alerts from the most recent execution should each carry along their own flapping history and we update "ongoing recovered" alerts for flapping with the latest execution UUID. I'm worried in the worst case, we'll be returning 20 x 1000 alerts whereas previously we'd be returning 2 * 1000. Or am I misunderstanding the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I also realized that but lesser size may cause to miss some of the alerts of the last execution.
Yeah in the worst case scenario - if the rule generates 1000 new alerts on each execution- the query returns 20.000 alerts. Under normal circumstances, even if it generates 1000 alerts, they remain as ongoing and only the last execution would carry them.

Actually this is the main difference between this query and the old getTrackedAlertsByExecutionUuids. Both returns the alerts of the last 20 executions. But the old one has limit of 2000 for all the alerts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we should try to find a way to avoid the worst case scenario, where we return 1000 alerts from 20 previous executions.

Possible options:

  • splitting the query into 2 - first a collapse query to get the last 20 execution UUIDs (without the inner hits clause) and then a second query to use the execution UUIDs to get alerts (limits the number of alerts that can be returned)
  • seeing if there's a way to limit the total size of inner hits returned within the single query
  • setting a flag on the ongoing recovered alerts to indicate that they shouldn't be returned for summary alerts queries while still updating the execution UUID to the latest.
{ terms: { [ALERT_UUID]: uuidsToFetch } },
],
must: [{ term: { [ALERT_RULE_UUID]: this.options.rule.id } }],
filter: [{ terms: { [ALERT_RULE_EXECUTION_UUID]: executionUuids } }],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we need to exclude status: untracked in this query? we didn't before, but I think that might have been an oversight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! thanks for pointing out. I overlooked it.
Actually the filter should be here, having it in the other one may cause to skip an execution.
Done.

const result = await this.search({
size: uuidsToFetch.length,
const alerts = await this.search({
size: (opts.maxAlerts || DEFAULT_MAX_ALERTS) * 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we sort by @timestamp here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useless IMO, we just need all the alerts from the last 20 executions, order doesn't matter.

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a small nit.

Verified creating a rule on main that creates active alerts, and switching to this branch. Rule continues running and getting alerts correctly. Verified throwing an error as described in verification instructions. Rule runs in next execution with no error. Verified downgrading back to main and running rule, rule runs correctly, using alert UUIDs from task state to continue getting alerts.

Approving but would love to get @doakalexi to take a look at the flapping logic to ensure the "ongoing recovered" alerts are queried correctly for flapping purposes.

if (uuidsToFetch.length <= 0) {
return [];
}
const executionUuids = executions.hits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if anything can go wrong with the query results but to be safe, could we add some optional accessors here and default to []? Like (execution?.hits ?? [])

Copy link
Contributor

@doakalexi doakalexi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but would love to get @doakalexi to take a look at the flapping logic to ensure the "ongoing recovered" alerts are queried correctly for flapping purposes.

Tested locally with flapping alerts, and LGTM! The alerts behaved as expected.

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #114 / Cloud Security Posture Test adding Cloud Security Posture Integrations CSPM AWS CIS_AWS Single Manual Temporary Keys CIS_AWS Single Manual Temporary Keys Workflow

Metrics [docs]

✅ unchanged

History

@ersin-erdal ersin-erdal merged commit 753d1cd into elastic:main Sep 22, 2025
12 checks passed
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Sep 24, 2025
…#235253)

Resolves: elastic#190376

Rule execution fails after persisting alerts, therefore the alerts and
the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last
execution-uuid wasn't added to the task state, the alert doc doesn't
come in the tracked alerts. Therefore it is considered as a new alert
but as it was already persistent in the previous execution gets a
conflict error.

This PR solves this problem by fetching the tracked alerts without
depending on the task state.

The new query groups the alerts of the running rule by execution-uuid.
And fetches the number of executions as much as the flapping lookback
window. Each execution-uuid group returns all the alerts belongs to
itself under `inner_hits`

### To verify:
1. Create an always firing Elasticsearch Query rule with `1 hour` run
interval.
2. Let the rule run and create an alert.
3. Apply the below diff
```
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
```
4. Wait for Kibana to restart
5. Run the rule on the UI by using "Run rule"
6. Observe the error message on the terminal
7. Remove the above change and wait for Kibana to restart.
8. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task
state.

The same scenario should be failing on main.
niros1 pushed a commit that referenced this pull request Sep 30, 2025
Resolves: #190376

Rule execution fails after persisting alerts, therefore the alerts and
the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last
execution-uuid wasn't added to the task state, the alert doc doesn't
come in the tracked alerts. Therefore it is considered as a new alert
but as it was already persistent in the previous execution gets a
conflict error.

This PR solves this problem by fetching the tracked alerts without
depending on the task state.

The new query groups the alerts of the running rule by execution-uuid.
And fetches the number of executions as much as the flapping lookback
window. Each execution-uuid group returns all the alerts belongs to
itself under `inner_hits`

### To verify:
1. Create an always firing Elasticsearch Query rule with `1 hour` run
interval.
2. Let the rule run and create an alert.
3. Apply the below diff
```
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
```
4. Wait for Kibana to restart
5. Run the rule on the UI by using "Run rule"
6. Observe the error message on the terminal
7. Remove the above change and wait for Kibana to restart.
8. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task
state.

The same scenario should be failing on main.
rylnd pushed a commit to rylnd/kibana that referenced this pull request Oct 17, 2025
…#235253)

Resolves: elastic#190376

Rule execution fails after persisting alerts, therefore the alerts and
the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last
execution-uuid wasn't added to the task state, the alert doc doesn't
come in the tracked alerts. Therefore it is considered as a new alert
but as it was already persistent in the previous execution gets a
conflict error.

This PR solves this problem by fetching the tracked alerts without
depending on the task state.

The new query groups the alerts of the running rule by execution-uuid.
And fetches the number of executions as much as the flapping lookback
window. Each execution-uuid group returns all the alerts belongs to
itself under `inner_hits`

### To verify:
1. Create an always firing Elasticsearch Query rule with `1 hour` run
interval.
2. Let the rule run and create an alert.
3. Apply the below diff
```
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
```
4. Wait for Kibana to restart
5. Run the rule on the UI by using "Run rule"
6. Observe the error message on the terminal
7. Remove the above change and wait for Kibana to restart.
8. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task
state.

The same scenario should be failing on main.
ersin-erdal added a commit to ersin-erdal/kibana that referenced this pull request Nov 13, 2025
…#235253)

Resolves: elastic#190376

Rule execution fails after persisting alerts, therefore the alerts and
the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last
execution-uuid wasn't added to the task state, the alert doc doesn't
come in the tracked alerts. Therefore it is considered as a new alert
but as it was already persistent in the previous execution gets a
conflict error.

This PR solves this problem by fetching the tracked alerts without
depending on the task state.

The new query groups the alerts of the running rule by execution-uuid.
And fetches the number of executions as much as the flapping lookback
window. Each execution-uuid group returns all the alerts belongs to
itself under `inner_hits`

### To verify:
1. Create an always firing Elasticsearch Query rule with `1 hour` run
interval.
2. Let the rule run and create an alert.
3. Apply the below diff
```
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
```
4. Wait for Kibana to restart
5. Run the rule on the UI by using "Run rule"
6. Observe the error message on the terminal
7. Remove the above change and wait for Kibana to restart.
8. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task
state.

The same scenario should be failing on main.

(cherry picked from commit 753d1cd)

# Conflicts:
#	x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.test.ts
#	x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@ersin-erdal
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.1
8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

ersin-erdal added a commit to ersin-erdal/kibana that referenced this pull request Nov 13, 2025
…#235253)

Resolves: elastic#190376

Rule execution fails after persisting alerts, therefore the alerts and
the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last
execution-uuid wasn't added to the task state, the alert doc doesn't
come in the tracked alerts. Therefore it is considered as a new alert
but as it was already persistent in the previous execution gets a
conflict error.

This PR solves this problem by fetching the tracked alerts without
depending on the task state.

The new query groups the alerts of the running rule by execution-uuid.
And fetches the number of executions as much as the flapping lookback
window. Each execution-uuid group returns all the alerts belongs to
itself under `inner_hits`

### To verify:
1. Create an always firing Elasticsearch Query rule with `1 hour` run
interval.
2. Let the rule run and create an alert.
3. Apply the below diff
```
diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
index a5dcfa0..bb60c761740 100644
--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
@@ -438,6 +438,8 @@ export class TaskRunner<
       recoveredAlertsToReturn = alerts.rawRecoveredAlerts;
     }

+    throw new Error('fail');
+
     return {
       metrics: ruleRunMetricsStore.getMetrics(),
       state: {
```
4. Wait for Kibana to restart
5. Run the rule on the UI by using "Run rule"
6. Observe the error message on the terminal
7. Remove the above change and wait for Kibana to restart.
8. Run the rule on the UI by using "Run rule"

Rule should run without any error and update the alert and the task
state.

The same scenario should be failing on main.

(cherry picked from commit 753d1cd)

# Conflicts:
#	x-pack/platform/plugins/shared/alerting/server/alerts_client/alerts_client.test.ts
#	x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.test.ts
#	x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
ersin-erdal added a commit that referenced this pull request Nov 14, 2025
…235253) (#242967)

# Backport

This will backport the following commits from `main` to `8.19`:
- [Fetch the tracked alerts without depending on the task state
(#235253)](#235253)

<!--- Backport version: 10.1.0 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Ersin
Erdal","email":"92688503+ersin-erdal@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-09-22T17:28:02Z","message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:skip","Team:ResponseOps","v9.2.0"],"title":"Fetch
the tracked alerts without depending on the task
state","number":235253,"url":"https://github.com/elastic/kibana/pull/235253","mergeCommit":{"message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/235253","number":235253,"mergeCommit":{"message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}}]}] BACKPORT-->
ersin-erdal added a commit that referenced this pull request Nov 14, 2025
…235253) (#242965)

# Backport

This will backport the following commits from `main` to `9.1`:
- [Fetch the tracked alerts without depending on the task state
(#235253)](#235253)

<!--- Backport version: 10.1.0 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Ersin
Erdal","email":"92688503+ersin-erdal@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-09-22T17:28:02Z","message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:skip","Team:ResponseOps","v9.2.0"],"title":"Fetch
the tracked alerts without depending on the task
state","number":235253,"url":"https://github.com/elastic/kibana/pull/235253","mergeCommit":{"message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/235253","number":235253,"mergeCommit":{"message":"Fetch
the tracked alerts without depending on the task state
(#235253)\n\nResolves: #190376\n\nRule execution fails after persisting
alerts, therefore the alerts and\nthe execution-uuids in the task state
cannot be updated.\nOn the next execution, same alert is reported but
since the last\nexecution-uuid wasn't added to the task state, the alert
doc doesn't\ncome in the tracked alerts. Therefore it is considered as a
new alert\nbut as it was already persistent in the previous execution
gets a\nconflict error.\n\nThis PR solves this problem by fetching the
tracked alerts without\ndepending on the task state.\n\nThe new query
groups the alerts of the running rule by execution-uuid.\nAnd fetches
the number of executions as much as the flapping lookback\nwindow. Each
execution-uuid group returns all the alerts belongs to\nitself under
`inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch
Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create
an alert.\n3. Apply the below diff\n```\ndiff --git
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex
a5dcfa0..bb60c761740 100644\n---
a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++
b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@
-438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn =
alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return
{\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait
for Kibana to restart\n5. Run the rule on the UI by using \"Run
rule\"\n6. Observe the error message on the terminal\n7. Remove the
above change and wait for Kibana to restart.\n8. Run the rule on the UI
by using \"Run rule\"\n\nRule should run without any error and update
the alert and the task\nstate.\n\nThe same scenario should be failing on
main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}}]}] BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:fix Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v8.19.8 v9.1.8 v9.2.0

5 participants