Fetch the tracked alerts without depending on the task state#235253
Fetch the tracked alerts without depending on the task state#235253ersin-erdal merged 14 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/response-ops (Team:ResponseOps) |
| const result = await this.search({ | ||
| size: (opts.maxAlerts || DEFAULT_MAX_ALERTS) * 2, | ||
| seq_no_primary_term: true, | ||
| size: opts.flappingSettings.lookBackWindow, |
There was a problem hiding this comment.
Do you think we need the last 20 executions worth of alerts? Alerts from the most recent execution should each carry along their own flapping history and we update "ongoing recovered" alerts for flapping with the latest execution UUID. I'm worried in the worst case, we'll be returning 20 x 1000 alerts whereas previously we'd be returning 2 * 1000. Or am I misunderstanding the query?
There was a problem hiding this comment.
Yeah I also realized that but lesser size may cause to miss some of the alerts of the last execution.
Yeah in the worst case scenario - if the rule generates 1000 new alerts on each execution- the query returns 20.000 alerts. Under normal circumstances, even if it generates 1000 alerts, they remain as ongoing and only the last execution would carry them.
Actually this is the main difference between this query and the old getTrackedAlertsByExecutionUuids. Both returns the alerts of the last 20 executions. But the old one has limit of 2000 for all the alerts.
There was a problem hiding this comment.
As discussed offline, we should try to find a way to avoid the worst case scenario, where we return 1000 alerts from 20 previous executions.
Possible options:
- splitting the query into 2 - first a collapse query to get the last 20 execution UUIDs (without the inner hits clause) and then a second query to use the execution UUIDs to get alerts (limits the number of alerts that can be returned)
- seeing if there's a way to limit the total size of inner hits returned within the single query
- setting a flag on the ongoing recovered alerts to indicate that they shouldn't be returned for summary alerts queries while still updating the execution UUID to the latest.
x-pack/platform/plugins/shared/alerting/server/alerts_client/alerts_client.ts
Outdated
Show resolved
Hide resolved
| { terms: { [ALERT_UUID]: uuidsToFetch } }, | ||
| ], | ||
| must: [{ term: { [ALERT_RULE_UUID]: this.options.rule.id } }], | ||
| filter: [{ terms: { [ALERT_RULE_EXECUTION_UUID]: executionUuids } }], |
There was a problem hiding this comment.
do you think we need to exclude status: untracked in this query? we didn't before, but I think that might have been an oversight?
There was a problem hiding this comment.
Oh! thanks for pointing out. I overlooked it.
Actually the filter should be here, having it in the other one may cause to skip an execution.
Done.
| const result = await this.search({ | ||
| size: uuidsToFetch.length, | ||
| const alerts = await this.search({ | ||
| size: (opts.maxAlerts || DEFAULT_MAX_ALERTS) * 2, |
There was a problem hiding this comment.
should we sort by @timestamp here too?
There was a problem hiding this comment.
Would be useless IMO, we just need all the alerts from the last 20 executions, order doesn't matter.
ymao1
left a comment
There was a problem hiding this comment.
LGTM. Left a small nit.
Verified creating a rule on main that creates active alerts, and switching to this branch. Rule continues running and getting alerts correctly. Verified throwing an error as described in verification instructions. Rule runs in next execution with no error. Verified downgrading back to main and running rule, rule runs correctly, using alert UUIDs from task state to continue getting alerts.
Approving but would love to get @doakalexi to take a look at the flapping logic to ensure the "ongoing recovered" alerts are queried correctly for flapping purposes.
| if (uuidsToFetch.length <= 0) { | ||
| return []; | ||
| } | ||
| const executionUuids = executions.hits |
There was a problem hiding this comment.
Not sure if anything can go wrong with the query results but to be safe, could we add some optional accessors here and default to []? Like (execution?.hits ?? [])
doakalexi
left a comment
There was a problem hiding this comment.
Approving but would love to get @doakalexi to take a look at the flapping logic to ensure the "ongoing recovered" alerts are queried correctly for flapping purposes.
Tested locally with flapping alerts, and LGTM! The alerts behaved as expected.
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
|
…#235253) Resolves: elastic#190376 Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated. On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error. This PR solves this problem by fetching the tracked alerts without depending on the task state. The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under `inner_hits` ### To verify: 1. Create an always firing Elasticsearch Query rule with `1 hour` run interval. 2. Let the rule run and create an alert. 3. Apply the below diff ``` diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts index a5dcfa0..bb60c761740 100644 --- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts +++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts @@ -438,6 +438,8 @@ export class TaskRunner< recoveredAlertsToReturn = alerts.rawRecoveredAlerts; } + throw new Error('fail'); + return { metrics: ruleRunMetricsStore.getMetrics(), state: { ``` 4. Wait for Kibana to restart 5. Run the rule on the UI by using "Run rule" 6. Observe the error message on the terminal 7. Remove the above change and wait for Kibana to restart. 8. Run the rule on the UI by using "Run rule" Rule should run without any error and update the alert and the task state. The same scenario should be failing on main.
Resolves: #190376 Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated. On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error. This PR solves this problem by fetching the tracked alerts without depending on the task state. The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under `inner_hits` ### To verify: 1. Create an always firing Elasticsearch Query rule with `1 hour` run interval. 2. Let the rule run and create an alert. 3. Apply the below diff ``` diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts index a5dcfa0..bb60c761740 100644 --- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts +++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts @@ -438,6 +438,8 @@ export class TaskRunner< recoveredAlertsToReturn = alerts.rawRecoveredAlerts; } + throw new Error('fail'); + return { metrics: ruleRunMetricsStore.getMetrics(), state: { ``` 4. Wait for Kibana to restart 5. Run the rule on the UI by using "Run rule" 6. Observe the error message on the terminal 7. Remove the above change and wait for Kibana to restart. 8. Run the rule on the UI by using "Run rule" Rule should run without any error and update the alert and the task state. The same scenario should be failing on main.
…#235253) Resolves: elastic#190376 Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated. On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error. This PR solves this problem by fetching the tracked alerts without depending on the task state. The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under `inner_hits` ### To verify: 1. Create an always firing Elasticsearch Query rule with `1 hour` run interval. 2. Let the rule run and create an alert. 3. Apply the below diff ``` diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts index a5dcfa0..bb60c761740 100644 --- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts +++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts @@ -438,6 +438,8 @@ export class TaskRunner< recoveredAlertsToReturn = alerts.rawRecoveredAlerts; } + throw new Error('fail'); + return { metrics: ruleRunMetricsStore.getMetrics(), state: { ``` 4. Wait for Kibana to restart 5. Run the rule on the UI by using "Run rule" 6. Observe the error message on the terminal 7. Remove the above change and wait for Kibana to restart. 8. Run the rule on the UI by using "Run rule" Rule should run without any error and update the alert and the task state. The same scenario should be failing on main.
…#235253) Resolves: elastic#190376 Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated. On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error. This PR solves this problem by fetching the tracked alerts without depending on the task state. The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under `inner_hits` ### To verify: 1. Create an always firing Elasticsearch Query rule with `1 hour` run interval. 2. Let the rule run and create an alert. 3. Apply the below diff ``` diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts index a5dcfa0..bb60c761740 100644 --- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts +++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts @@ -438,6 +438,8 @@ export class TaskRunner< recoveredAlertsToReturn = alerts.rawRecoveredAlerts; } + throw new Error('fail'); + return { metrics: ruleRunMetricsStore.getMetrics(), state: { ``` 4. Wait for Kibana to restart 5. Run the rule on the UI by using "Run rule" 6. Observe the error message on the terminal 7. Remove the above change and wait for Kibana to restart. 8. Run the rule on the UI by using "Run rule" Rule should run without any error and update the alert and the task state. The same scenario should be failing on main. (cherry picked from commit 753d1cd) # Conflicts: # x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.test.ts # x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…#235253) Resolves: elastic#190376 Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated. On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error. This PR solves this problem by fetching the tracked alerts without depending on the task state. The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under `inner_hits` ### To verify: 1. Create an always firing Elasticsearch Query rule with `1 hour` run interval. 2. Let the rule run and create an alert. 3. Apply the below diff ``` diff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts index a5dcfa0..bb60c761740 100644 --- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts +++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts @@ -438,6 +438,8 @@ export class TaskRunner< recoveredAlertsToReturn = alerts.rawRecoveredAlerts; } + throw new Error('fail'); + return { metrics: ruleRunMetricsStore.getMetrics(), state: { ``` 4. Wait for Kibana to restart 5. Run the rule on the UI by using "Run rule" 6. Observe the error message on the terminal 7. Remove the above change and wait for Kibana to restart. 8. Run the rule on the UI by using "Run rule" Rule should run without any error and update the alert and the task state. The same scenario should be failing on main. (cherry picked from commit 753d1cd) # Conflicts: # x-pack/platform/plugins/shared/alerting/server/alerts_client/alerts_client.test.ts # x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.test.ts # x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts
…235253) (#242967) # Backport This will backport the following commits from `main` to `8.19`: - [Fetch the tracked alerts without depending on the task state (#235253)](#235253) <!--- Backport version: 10.1.0 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Ersin Erdal","email":"92688503+ersin-erdal@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-09-22T17:28:02Z","message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:skip","Team:ResponseOps","v9.2.0"],"title":"Fetch the tracked alerts without depending on the task state","number":235253,"url":"https://github.com/elastic/kibana/pull/235253","mergeCommit":{"message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/235253","number":235253,"mergeCommit":{"message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}}]}] BACKPORT-->
…235253) (#242965) # Backport This will backport the following commits from `main` to `9.1`: - [Fetch the tracked alerts without depending on the task state (#235253)](#235253) <!--- Backport version: 10.1.0 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Ersin Erdal","email":"92688503+ersin-erdal@users.noreply.github.com"},"sourceCommit":{"committedDate":"2025-09-22T17:28:02Z","message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb","branchLabelMapping":{"^v9.2.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:skip","Team:ResponseOps","v9.2.0"],"title":"Fetch the tracked alerts without depending on the task state","number":235253,"url":"https://github.com/elastic/kibana/pull/235253","mergeCommit":{"message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.2.0","branchLabelMappingKey":"^v9.2.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/235253","number":235253,"mergeCommit":{"message":"Fetch the tracked alerts without depending on the task state (#235253)\n\nResolves: #190376\n\nRule execution fails after persisting alerts, therefore the alerts and\nthe execution-uuids in the task state cannot be updated.\nOn the next execution, same alert is reported but since the last\nexecution-uuid wasn't added to the task state, the alert doc doesn't\ncome in the tracked alerts. Therefore it is considered as a new alert\nbut as it was already persistent in the previous execution gets a\nconflict error.\n\nThis PR solves this problem by fetching the tracked alerts without\ndepending on the task state.\n\nThe new query groups the alerts of the running rule by execution-uuid.\nAnd fetches the number of executions as much as the flapping lookback\nwindow. Each execution-uuid group returns all the alerts belongs to\nitself under `inner_hits`\n\n### To verify:\n1. Create an always firing Elasticsearch Query rule with `1 hour` run\ninterval.\n2. Let the rule run and create an alert.\n3. Apply the below diff\n```\ndiff --git a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\nindex a5dcfa0..bb60c761740 100644\n--- a/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n+++ b/x-pack/platform/plugins/shared/alerting/server/task_runner/task_runner.ts\n@@ -438,6 +438,8 @@ export class TaskRunner<\n recoveredAlertsToReturn = alerts.rawRecoveredAlerts;\n }\n\n+ throw new Error('fail');\n+\n return {\n metrics: ruleRunMetricsStore.getMetrics(),\n state: {\n```\n4. Wait for Kibana to restart\n5. Run the rule on the UI by using \"Run rule\"\n6. Observe the error message on the terminal\n7. Remove the above change and wait for Kibana to restart.\n8. Run the rule on the UI by using \"Run rule\"\n\nRule should run without any error and update the alert and the task\nstate.\n\nThe same scenario should be failing on main.","sha":"753d1cd13617b17056e89119baee5bb3b1af70fb"}}]}] BACKPORT-->
Resolves: #190376
Rule execution fails after persisting alerts, therefore the alerts and the execution-uuids in the task state cannot be updated.
On the next execution, same alert is reported but since the last execution-uuid wasn't added to the task state, the alert doc doesn't come in the tracked alerts. Therefore it is considered as a new alert but as it was already persistent in the previous execution gets a conflict error.
This PR solves this problem by fetching the tracked alerts without depending on the task state.
The new query groups the alerts of the running rule by execution-uuid. And fetches the number of executions as much as the flapping lookback window. Each execution-uuid group returns all the alerts belongs to itself under
inner_hitsTo verify:
1 hourrun interval.Rule should run without any error and update the alert and the task state.
The same scenario should be failing on main.