Optimize bulk actions endpoint & update gaps by denar50 · Pull Request #222158 · elastic/kibana

denar50 · 2025-06-02T09:08:05Z

Summary

These optimizations aim to improve the performance of the bulk actions endpoint.

The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a bulkGet method has been added to the rules action client.

The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below).

The following screenshots were taken under these conditions:
I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days.

Before:

After:

How to test?

Use this tool to create 1 rule (5m) with 1000 gaps.

yarn start rules --rules 1 -g 1000 -c -i"5m"

Then do a manual run on it for a period of 90 days.

elasticmachine · 2025-06-02T09:09:49Z

Pinging @elastic/security-detection-engine (Team:Detection Engine)

ymao1

Code review only primarily focused on the rules client bulk get function.

x-pack/platform/plugins/shared/alerting/server/rules_client/rules_client.ts

...ugins/shared/alerting/server/application/rule/methods/get/schemas/get_rules_params_schema.ts

ymao1 · 2025-06-06T13:31:14Z

...ugins/shared/alerting/server/application/rule/methods/get/schemas/get_rules_params_schema.ts

+import { schema } from '@kbn/config-schema';
+
+export const getRulesParamsSchema = schema.object({
+  ids: schema.arrayOf(schema.string()),


Suggested change

ids: schema.arrayOf(schema.string()),

ids: schema.arrayOf(schema.string({ minLength: 1 }), { defaultValue: [] })

Having minLength of 1 and a default value of an empty array seems a bit conflicting to me. I have instead enforced minLength of 1 and throw an error if that is not the case as you suggested in the comment below. Would this be alright?

the minLength applies to the schema.string() while the defaultValue: [] applies to the schema.arrayOf

ymao1 · 2025-06-06T13:32:04Z

x-pack/platform/plugins/shared/alerting/server/application/rule/methods/get/get_rules.ts

+  } catch (error) {
+    throw Boom.badRequest(`Error validating get rules params - ${error.message}`);
+  }
+


we should check that params.ids.length > 0 and throw an error if it is an empty array.

ymao1 · 2025-06-06T14:30:54Z

x-pack/platform/plugins/shared/alerting/server/application/rule/methods/get/get_rules.ts

+
+  const savedObjects: Awaited<ReturnType<typeof bulkGetRulesSo>>['saved_objects'] = [];
+
+  await pMap(


our other bulk functions use this function checkAuthorizationAndGetTotal to get the rules and perform the auth check, i wonder if we could reuse this function to perform all the auth checks and then do the bulkGetSo call afterward. i like having that logic in a centralized place used by all the bulk functions so if something about the auth check changes, we only have to update one spot.

the only difference being if any rule type is not authorized, the entire function would throw, vs this implementation where it would return partial results. however since all the detection rule types are covered under the same feature privilege, I imagine this is not a big concern for you all?

auth check function: https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/rules_client/lib/check_authorization_and_get_total.ts

example usage in bulk enable:

kibana/x-pack/platform/plugins/shared/alerting/server/application/rule/methods/bulk_enable/bulk_enable_rules.ts

Line 100 in 4bc129e

const { total } = await checkAuthorizationAndGetTotal(context, {

Done. I pushed a commit where I use the function that you suggested instead.

x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/update/utils.ts

ymao1 · 2025-06-06T14:39:38Z

x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/gap/interval_utils.ts

+  return { gte: clipped.start, lte: clipped.end };
+};
+
+export const clipDateIntervals = (


do we need any unit tests for this?

Unit test added 🙏

dplumlee

Rule management changes look pretty good to me @denar50, looks like the bulk delete tests are failing still and I want to make sure of the timeout signal thing before approving

dplumlee · 2025-06-09T18:49:52Z

...r/lib/detection_engine/rule_management/api/rules/bulk_actions/fetch_rules_by_query_or_ids.ts

+      })),
+      errors: errors.map(({ id, error }) => {
+        let message = 'Error resolving the rule';
+        if (error.statusCode === 404) {


I think this is why the delete_rules_bulk.ts test files are failing

The tests were failing due to the bulkGetRules method in the rules client not returning a summary of failed rules but rather throwing an error in some cases. I spoke to Ying and we agreed to keep this behaviour for consistency and instead catch the error in the place where it is called. I have implemented that in this commit.

dplumlee · 2025-06-09T18:51:32Z

...r/lib/detection_engine/rule_management/api/rules/bulk_actions/fetch_rules_by_query_or_ids.ts

        }
-        return rule;
-      },
-      abortSignal,


~~xcrzx~~ (Got the answer and don't need to ping Dmitrii anymore) I seem to remember you doing work on bulk action requests timing out, do you think removing the abortSignal here is problematic?

The abort signal was used only in the case where we fetched the rules one by one using initPromisePool. In this new implementation I stopped using initPromisePool and instead added a bulkGetRules method in the rules client.

Yeah, I remember we had issues with the route timing out in the past but I guess using the rulesClient path directly should subvert those issues 👍

ymao1

LGTM! Left a few small comments.

ymao1 · 2025-06-10T14:38:05Z

...latform/plugins/shared/alerting/server/rules_client/lib/check_authorization_and_get_total.ts

      RuleAuditAction: RuleAuditAction.DISABLE,
    },
+    GET: {
+      WriteOperation: ReadOperations.Get,


Can we add a BulkGet operation and audit action?

ymao1 · 2025-06-11T14:54:01Z

x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/update/update_gaps.ts

  }
 ) => {
-  const hasFailedBackfillTask = backfillSchedule?.some(
+  const hasFailedBackfillTask = scheduledItems?.some(


nit: unnecessary ?

ymao1 · 2025-06-11T14:57:18Z

x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/update/update_gaps.ts

  }

-  if (hasFailedBackfillTask || !backfillSchedule || shouldRefetchAllBackfills) {
+  if (hasFailedBackfillTask || !scheduledItems || shouldRefetchAllBackfills) {


since scheduledItems should always be defined now, should this check scheduledItems.length === 0?

Agree. Fixed!

ymao1 · 2025-06-11T15:00:24Z

x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/update/calculate_gaps_state.ts

    if ('error' in backfill) {
      continue;
    }
+    const scheduledItems = backfill?.schedule.map(toScheduledItem) ?? [];


do you need to catch any errors from toScheduledItem here?

Good catch! I have just added exception handling logic for this 🙏

dplumlee

LGTM @denar50, did some stress tests for the bulk enable route using the ids param with 10k rule ids and wasn't able to see any issues, noticeable update in speed 🚀

…-fix'

azasypkin

The change LGTM. It looks like you still need to update a few security-related tests to make CI happy, but approving to unblock you.

denar50 · 2025-06-12T21:06:48Z

/ci

elasticmachine · 2025-06-12T22:49:38Z

💚 Build Succeeded

Buildkite Build
Commit: d03deb2

Metrics [docs]

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`alerting`	53	54	+1

Unknown metric groups

ESLint disabled line counts

id	before	after	diff
`alerting`	96	97	+1

Total ESLint disabled count

id	before	after	diff
`alerting`	103	104	+1

History

cc @denar50

kibanamachine · 2025-06-13T05:23:33Z

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15627112699

kibanamachine · 2025-06-13T05:29:44Z

💔 All backports failed

Status	Branch	Result
❌	8.19	Backport failed because of merge conflicts You might need to backport the following PRs to 8.19: - Bulk operations support `gap_range` (#221078)

Manual backport

To create the backport manually run:

node scripts/backport --pr 222158

Questions ?

Please refer to the Backport tool documentation

## Summary These optimizations aim to improve the performance of the bulk actions endpoint. The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a `bulkGet` method has been added to the rules action client. The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below). The following screenshots were taken under these conditions: I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days. Before: ![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25) After: ![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03) ## How to test? Use [this tool](https://github.com/elastic/security-documents-generator) to create 1 rule (5m) with 1000 gaps. ``` yarn start rules --rules 1 -g 1000 -c -i"5m" ``` Then do a manual run on it for a period of 90 days. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

kibanamachine · 2025-06-17T05:48:37Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

kibanamachine · 2025-06-18T06:49:08Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

kibanamachine · 2025-06-19T07:48:43Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

kibanamachine · 2025-06-20T07:49:02Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

nkhristinin · 2025-06-20T10:31:03Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

These optimizations aim to improve the performance of the bulk actions endpoint. The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a `bulkGet` method has been added to the rules action client. The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below). The following screenshots were taken under these conditions: I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days. Before: ![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25) After: ![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03) Use [this tool](https://github.com/elastic/security-documents-generator) to create 1 rule (5m) with 1000 gaps. ``` yarn start rules --rules 1 -g 1000 -c -i"5m" ``` Then do a manual run on it for a period of 90 days. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 1109f52)

# Backport This will backport the following commits from `main` to `8.19`: - [Optimize bulk actions endpoint & update gaps (#222158)](#222158)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport)  Co-authored-by: Edgar Santos <edgar.santos@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

denar50 requested review from a team as code owners June 2, 2025 09:08

denar50 requested a review from dplumlee June 2, 2025 09:08

denar50 added release_note:skip Skip the PR/issue when compiling release notes Team:Detection Engine Security Solution Detection Engine Area backport:version Backport to applied version labels v9.1.0 v8.19.0 labels Jun 2, 2025

denar50 self-assigned this Jun 2, 2025

denar50 changed the title ~~Security team 10688 optimize update gaps~~ Jun 2, 2025

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from ed79113 to ae00d29 Compare June 2, 2025 10:03

denar50 mentioned this pull request Jun 2, 2025

Add an API endpoint to bulk fill rule gaps #220866

Merged

denar50 force-pushed the security-team-10688-optimize-update-gaps branch 5 times, most recently from 6d41b31 to a51e110 Compare June 4, 2025 13:00

denar50 changed the title ~~Optimize update gaps~~ Jun 5, 2025

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from a51e110 to 5dd6de3 Compare June 5, 2025 18:04

ymao1 reviewed Jun 6, 2025

View reviewed changes

denar50 force-pushed the security-team-10688-optimize-update-gaps branch 3 times, most recently from f5cf537 to bea0358 Compare June 9, 2025 12:18

dplumlee reviewed Jun 9, 2025

View reviewed changes

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from b7f699d to 02ec4e2 Compare June 11, 2025 12:19

ymao1 approved these changes Jun 11, 2025

View reviewed changes

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from e1ceacd to 129fb14 Compare June 11, 2025 20:36

dplumlee approved these changes Jun 11, 2025

View reviewed changes

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from 129fb14 to 46048be Compare June 12, 2025 05:33

kibanamachine and others added 3 commits June 12, 2025 11:33

[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…

58a4472

…-fix'

Add BulkGet read operation and audit action

77400d1

handle exceptions thrown by toScheduledItem

0255e7f

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from 46048be to 0255e7f Compare June 12, 2025 09:33

denar50 requested a review from a team as a code owner June 12, 2025 13:10

add bulkGet to read operations

4a21401

denar50 force-pushed the security-team-10688-optimize-update-gaps branch from e180efd to 4a21401 Compare June 12, 2025 14:32

denar50 added 2 commits June 12, 2025 17:37

fix api integration tests

49c4925

fix integration tests

96ca64e

azasypkin approved these changes Jun 12, 2025

View reviewed changes

Merge branch 'main' into security-team-10688-optimize-update-gaps

d03deb2

denar50 merged commit 1109f52 into main Jun 13, 2025
10 checks passed

denar50 deleted the security-team-10688-optimize-update-gaps branch June 13, 2025 05:23

kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 17, 2025

This was referenced Jun 19, 2025

[Response Ops][Reporting] Scheduled Reports #221028

Merged

[Security Solution][Detection Engine] adds simplified bulk edit for alert suppression #223090

Merged

nkhristinin mentioned this pull request Jun 20, 2025

[8.19] Optimize bulk actions endpoint & update gaps (#222158) #224660

Merged

kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 20, 2025

	ids: schema.arrayOf(schema.string()),
	ids: schema.arrayOf(schema.string({ minLength: 1 }), { defaultValue: [] })


		const savedObjects: Awaited<ReturnType<typeof bulkGetRulesSo>>['saved_objects'] = [];

		await pMap(

Conversation

denar50 commented Jun 2, 2025 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to test?

elasticmachine commented Jun 2, 2025

ymao1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

denar50 Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dplumlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dplumlee Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ymao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dplumlee left a comment

Choose a reason for hiding this comment

azasypkin left a comment

Choose a reason for hiding this comment

denar50 commented Jun 12, 2025

elasticmachine commented Jun 12, 2025

💚 Build Succeeded

Metrics [docs]

Public APIs missing exports

ESLint disabled line counts

Total ESLint disabled count

History

Uh oh!

kibanamachine commented Jun 13, 2025

kibanamachine commented Jun 13, 2025

💔 All backports failed

Manual backport

Questions ?

kibanamachine commented Jun 17, 2025

kibanamachine commented Jun 18, 2025

kibanamachine commented Jun 19, 2025

kibanamachine commented Jun 20, 2025

nkhristinin commented Jun 20, 2025

💚 All backports created successfully

Questions ?

Labels

7 participants

denar50 commented Jun 2, 2025 •

edited by kibanamachine

Loading

denar50 Jun 9, 2025 •

edited

Loading

dplumlee Jun 9, 2025 •

edited

Loading