Skip to content

Optimize bulk actions endpoint & update gaps#222158

Merged
denar50 merged 15 commits intomainfrom
security-team-10688-optimize-update-gaps
Jun 13, 2025
Merged

Optimize bulk actions endpoint & update gaps#222158
denar50 merged 15 commits intomainfrom
security-team-10688-optimize-update-gaps

Conversation

@denar50
Copy link
Contributor

@denar50 denar50 commented Jun 2, 2025

Summary

These optimizations aim to improve the performance of the bulk actions endpoint.

The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a bulkGet method has been added to the rules action client.

The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below).

The following screenshots were taken under these conditions:
I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days.

Before:
image

After:
image

How to test?

Use this tool to create 1 rule (5m) with 1000 gaps.

yarn start rules --rules 1 -g 1000 -c -i"5m"

Then do a manual run on it for a period of 90 days.

@denar50 denar50 requested review from a team as code owners June 2, 2025 09:08
@denar50 denar50 requested a review from dplumlee June 2, 2025 09:08
@denar50 denar50 added release_note:skip Skip the PR/issue when compiling release notes Team:Detection Engine Security Solution Detection Engine Area backport:version Backport to applied version labels v9.1.0 v8.19.0 labels Jun 2, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-engine (Team:Detection Engine)

@denar50 denar50 self-assigned this Jun 2, 2025
@denar50 denar50 changed the title Security team 10688 optimize update gaps Jun 2, 2025
@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from ed79113 to ae00d29 Compare June 2, 2025 10:03
@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch 5 times, most recently from 6d41b31 to a51e110 Compare June 4, 2025 13:00
@denar50 denar50 changed the title Optimize update gaps Jun 5, 2025
@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from a51e110 to 5dd6de3 Compare June 5, 2025 18:04
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review only primarily focused on the rules client bulk get function.

import { schema } from '@kbn/config-schema';

export const getRulesParamsSchema = schema.object({
ids: schema.arrayOf(schema.string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ids: schema.arrayOf(schema.string()),
ids: schema.arrayOf(schema.string({ minLength: 1 }), { defaultValue: [] })
Copy link
Contributor Author

@denar50 denar50 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having minLength of 1 and a default value of an empty array seems a bit conflicting to me. I have instead enforced minLength of 1 and throw an error if that is not the case as you suggested in the comment below. Would this be alright?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the minLength applies to the schema.string() while the defaultValue: [] applies to the schema.arrayOf

} catch (error) {
throw Boom.badRequest(`Error validating get rules params - ${error.message}`);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check that params.ids.length > 0 and throw an error if it is an empty array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


const savedObjects: Awaited<ReturnType<typeof bulkGetRulesSo>>['saved_objects'] = [];

await pMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our other bulk functions use this function checkAuthorizationAndGetTotal to get the rules and perform the auth check, i wonder if we could reuse this function to perform all the auth checks and then do the bulkGetSo call afterward. i like having that logic in a centralized place used by all the bulk functions so if something about the auth check changes, we only have to update one spot.

the only difference being if any rule type is not authorized, the entire function would throw, vs this implementation where it would return partial results. however since all the detection rule types are covered under the same feature privilege, I imagine this is not a big concern for you all?

auth check function: https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/rules_client/lib/check_authorization_and_get_total.ts

example usage in bulk enable:

const { total } = await checkAuthorizationAndGetTotal(context, {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I pushed a commit where I use the function that you suggested instead.

return { gte: clipped.start, lte: clipped.end };
};

export const clipDateIntervals = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need any unit tests for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added 🙏

@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch 3 times, most recently from f5cf537 to bea0358 Compare June 9, 2025 12:18
Copy link
Contributor

@dplumlee dplumlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule management changes look pretty good to me @denar50, looks like the bulk delete tests are failing still and I want to make sure of the timeout signal thing before approving

})),
errors: errors.map(({ id, error }) => {
let message = 'Error resolving the rule';
if (error.statusCode === 404) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is why the delete_rules_bulk.ts test files are failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests were failing due to the bulkGetRules method in the rules client not returning a summary of failed rules but rather throwing an error in some cases. I spoke to Ying and we agreed to keep this behaviour for consistency and instead catch the error in the place where it is called. I have implemented that in this commit.

}
return rule;
},
abortSignal,
Copy link
Contributor

@dplumlee dplumlee Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xcrzx (Got the answer and don't need to ping Dmitrii anymore) I seem to remember you doing work on bulk action requests timing out, do you think removing the abortSignal here is problematic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abort signal was used only in the case where we fetched the rules one by one using initPromisePool. In this new implementation I stopped using initPromisePool and instead added a bulkGetRules method in the rules client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember we had issues with the route timing out in the past but I guess using the rulesClient path directly should subvert those issues 👍

@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from b7f699d to 02ec4e2 Compare June 11, 2025 12:19
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a few small comments.

RuleAuditAction: RuleAuditAction.DISABLE,
},
GET: {
WriteOperation: ReadOperations.Get,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a BulkGet operation and audit action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
) => {
const hasFailedBackfillTask = backfillSchedule?.some(
const hasFailedBackfillTask = scheduledItems?.some(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

if (hasFailedBackfillTask || !backfillSchedule || shouldRefetchAllBackfills) {
if (hasFailedBackfillTask || !scheduledItems || shouldRefetchAllBackfills) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since scheduledItems should always be defined now, should this check scheduledItems.length === 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed!

if ('error' in backfill) {
continue;
}
const scheduledItems = backfill?.schedule.map(toScheduledItem) ?? [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to catch any errors from toScheduledItem here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I have just added exception handling logic for this 🙏

@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from e1ceacd to 129fb14 Compare June 11, 2025 20:36
Copy link
Contributor

@dplumlee dplumlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @denar50, did some stress tests for the bulk enable route using the ids param with 10k rule ids and wasn't able to see any issues, noticeable update in speed 🚀

@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from 129fb14 to 46048be Compare June 12, 2025 05:33
@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from 46048be to 0255e7f Compare June 12, 2025 09:33
@denar50 denar50 requested a review from a team as a code owner June 12, 2025 13:10
@denar50 denar50 force-pushed the security-team-10688-optimize-update-gaps branch from e180efd to 4a21401 Compare June 12, 2025 14:32
Copy link
Contributor

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM. It looks like you still need to update a few security-related tests to make CI happy, but approving to unblock you.

@denar50
Copy link
Contributor Author

denar50 commented Jun 12, 2025

/ci

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 53 54 +1
Unknown metric groups

ESLint disabled line counts

id before after diff
alerting 96 97 +1

Total ESLint disabled count

id before after diff
alerting 103 104 +1

History

cc @denar50

@denar50 denar50 merged commit 1109f52 into main Jun 13, 2025
10 checks passed
@denar50 denar50 deleted the security-team-10688-optimize-update-gaps branch June 13, 2025 05:23
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15627112699

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.19 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.19:
- Bulk operations support gap_range (#221078)

Manual backport

To create the backport manually run:

node scripts/backport --pr 222158

Questions ?

Please refer to the Backport tool documentation

iblancof pushed a commit to iblancof/kibana that referenced this pull request Jun 16, 2025
## Summary
These optimizations aim to improve the performance of the bulk actions
endpoint.

The first optimization has to do with how we resolve rules before
executing the bulk actions. Before this PR we were doing this
sequentially, but a `bulkGet` method has been added to the rules action
client.

The second optimization greatly improves the update gaps routine that
occurs after a backfill is scheduled (see the before and after
screenshots below).

The following screenshots were taken under these conditions:
I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over
a period of 90 days.

Before:

![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25)

After:

![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03)


## How to test?
Use [this tool](https://github.com/elastic/security-documents-generator)
to create 1 rule (5m) with 1000 gaps.
```
yarn start rules --rules 1 -g 1000 -c -i"5m"
```

Then do a manual run on it for a period of 90 days.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 17, 2025
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

2 similar comments
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 222158 locally
cc: @denar50

@nkhristinin
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

nkhristinin pushed a commit to nkhristinin/kibana that referenced this pull request Jun 20, 2025
These optimizations aim to improve the performance of the bulk actions
endpoint.

The first optimization has to do with how we resolve rules before
executing the bulk actions. Before this PR we were doing this
sequentially, but a `bulkGet` method has been added to the rules action
client.

The second optimization greatly improves the update gaps routine that
occurs after a backfill is scheduled (see the before and after
screenshots below).

The following screenshots were taken under these conditions:
I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over
a period of 90 days.

Before:

![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25)

After:

![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03)
Use [this tool](https://github.com/elastic/security-documents-generator)
to create 1 rule (5m) with 1000 gaps.
```
yarn start rules --rules 1 -g 1000 -c -i"5m"
```

Then do a manual run on it for a period of 90 days.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 1109f52)
nkhristinin added a commit that referenced this pull request Jun 20, 2025
# Backport

This will backport the following commits from `main` to `8.19`:
- [Optimize bulk actions endpoint & update gaps
(#222158)](#222158)

<!--- Backport version: 10.0.1 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Edgar
Santos","email":"edgar.santos@elastic.co"},"sourceCommit":{"committedDate":"2025-06-13T05:23:18Z","message":"Optimize
bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese
optimizations aim to improve the performance of the bulk
actions\nendpoint.\n\nThe first optimization has to do with how we
resolve rules before\nexecuting the bulk actions. Before this PR we were
doing this\nsequentially, but a `bulkGet` method has been added to the
rules action\nclient.\n\nThe second optimization greatly improves the
update gaps routine that\noccurs after a backfill is scheduled (see the
before and after\nscreenshots below).\n\nThe following screenshots were
taken under these conditions:\nI triggered a manual run bulk action on 1
rule (5m) with 1000 gaps over\na period of 90
days.\n\nBefore:\n\n![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25)\n\nAfter:\n\n![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03)\n\n\n##
How to test?\nUse [this
tool](https://github.com/elastic/security-documents-generator)\nto
create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g
1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90
days.\n\n---------\n\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport
missing","Team:Detection
Engine","backport:version","v9.1.0","v8.19.0"],"title":"Optimize bulk
actions endpoint & update
gaps","number":222158,"url":"https://github.com/elastic/kibana/pull/222158","mergeCommit":{"message":"Optimize
bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese
optimizations aim to improve the performance of the bulk
actions\nendpoint.\n\nThe first optimization has to do with how we
resolve rules before\nexecuting the bulk actions. Before this PR we were
doing this\nsequentially, but a `bulkGet` method has been added to the
rules action\nclient.\n\nThe second optimization greatly improves the
update gaps routine that\noccurs after a backfill is scheduled (see the
before and after\nscreenshots below).\n\nThe following screenshots were
taken under these conditions:\nI triggered a manual run bulk action on 1
rule (5m) with 1000 gaps over\na period of 90
days.\n\nBefore:\n\n![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25)\n\nAfter:\n\n![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03)\n\n\n##
How to test?\nUse [this
tool](https://github.com/elastic/security-documents-generator)\nto
create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g
1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90
days.\n\n---------\n\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/222158","number":222158,"mergeCommit":{"message":"Optimize
bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese
optimizations aim to improve the performance of the bulk
actions\nendpoint.\n\nThe first optimization has to do with how we
resolve rules before\nexecuting the bulk actions. Before this PR we were
doing this\nsequentially, but a `bulkGet` method has been added to the
rules action\nclient.\n\nThe second optimization greatly improves the
update gaps routine that\noccurs after a backfill is scheduled (see the
before and after\nscreenshots below).\n\nThe following screenshots were
taken under these conditions:\nI triggered a manual run bulk action on 1
rule (5m) with 1000 gaps over\na period of 90
days.\n\nBefore:\n\n![image](https://github.com/user-attachments/assets/44afc653-690c-4b04-b333-7b84faa19c25)\n\nAfter:\n\n![image](https://github.com/user-attachments/assets/31cec3f3-dd21-4e1e-a5e3-a957bbcbba03)\n\n\n##
How to test?\nUse [this
tool](https://github.com/elastic/security-documents-generator)\nto
create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g
1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90
days.\n\n---------\n\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Edgar Santos <edgar.santos@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@kibanamachine kibanamachine removed the backport missing Added to PRs automatically when the are determined to be missing a backport. label Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Detection Engine Security Solution Detection Engine Area v8.19.0 v9.1.0

7 participants