Optimize bulk actions endpoint & update gaps#222158
Conversation
|
Pinging @elastic/security-detection-engine (Team:Detection Engine) |
ed79113 to
ae00d29
Compare
6d41b31 to
a51e110
Compare
a51e110 to
5dd6de3
Compare
ymao1
left a comment
There was a problem hiding this comment.
Code review only primarily focused on the rules client bulk get function.
x-pack/platform/plugins/shared/alerting/server/rules_client/rules_client.ts
Outdated
Show resolved
Hide resolved
...ugins/shared/alerting/server/application/rule/methods/get/schemas/get_rules_params_schema.ts
Show resolved
Hide resolved
| import { schema } from '@kbn/config-schema'; | ||
|
|
||
| export const getRulesParamsSchema = schema.object({ | ||
| ids: schema.arrayOf(schema.string()), |
There was a problem hiding this comment.
| ids: schema.arrayOf(schema.string()), | |
| ids: schema.arrayOf(schema.string({ minLength: 1 }), { defaultValue: [] }) |
There was a problem hiding this comment.
Having minLength of 1 and a default value of an empty array seems a bit conflicting to me. I have instead enforced minLength of 1 and throw an error if that is not the case as you suggested in the comment below. Would this be alright?
There was a problem hiding this comment.
the minLength applies to the schema.string() while the defaultValue: [] applies to the schema.arrayOf
| } catch (error) { | ||
| throw Boom.badRequest(`Error validating get rules params - ${error.message}`); | ||
| } | ||
|
|
There was a problem hiding this comment.
we should check that params.ids.length > 0 and throw an error if it is an empty array.
|
|
||
| const savedObjects: Awaited<ReturnType<typeof bulkGetRulesSo>>['saved_objects'] = []; | ||
|
|
||
| await pMap( |
There was a problem hiding this comment.
our other bulk functions use this function checkAuthorizationAndGetTotal to get the rules and perform the auth check, i wonder if we could reuse this function to perform all the auth checks and then do the bulkGetSo call afterward. i like having that logic in a centralized place used by all the bulk functions so if something about the auth check changes, we only have to update one spot.
the only difference being if any rule type is not authorized, the entire function would throw, vs this implementation where it would return partial results. however since all the detection rule types are covered under the same feature privilege, I imagine this is not a big concern for you all?
auth check function: https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/rules_client/lib/check_authorization_and_get_total.ts
example usage in bulk enable:
There was a problem hiding this comment.
Done. I pushed a commit where I use the function that you suggested instead.
x-pack/platform/plugins/shared/alerting/server/lib/rule_gaps/update/utils.ts
Show resolved
Hide resolved
| return { gte: clipped.start, lte: clipped.end }; | ||
| }; | ||
|
|
||
| export const clipDateIntervals = ( |
There was a problem hiding this comment.
do we need any unit tests for this?
f5cf537 to
bea0358
Compare
| })), | ||
| errors: errors.map(({ id, error }) => { | ||
| let message = 'Error resolving the rule'; | ||
| if (error.statusCode === 404) { |
There was a problem hiding this comment.
I think this is why the delete_rules_bulk.ts test files are failing
There was a problem hiding this comment.
The tests were failing due to the bulkGetRules method in the rules client not returning a summary of failed rules but rather throwing an error in some cases. I spoke to Ying and we agreed to keep this behaviour for consistency and instead catch the error in the place where it is called. I have implemented that in this commit.
| } | ||
| return rule; | ||
| }, | ||
| abortSignal, |
There was a problem hiding this comment.
xcrzx (Got the answer and don't need to ping Dmitrii anymore) I seem to remember you doing work on bulk action requests timing out, do you think removing the abortSignal here is problematic?
There was a problem hiding this comment.
The abort signal was used only in the case where we fetched the rules one by one using initPromisePool. In this new implementation I stopped using initPromisePool and instead added a bulkGetRules method in the rules client.
There was a problem hiding this comment.
Yeah, I remember we had issues with the route timing out in the past but I guess using the rulesClient path directly should subvert those issues 👍
b7f699d to
02ec4e2
Compare
ymao1
left a comment
There was a problem hiding this comment.
LGTM! Left a few small comments.
| RuleAuditAction: RuleAuditAction.DISABLE, | ||
| }, | ||
| GET: { | ||
| WriteOperation: ReadOperations.Get, |
There was a problem hiding this comment.
Can we add a BulkGet operation and audit action?
| } | ||
| ) => { | ||
| const hasFailedBackfillTask = backfillSchedule?.some( | ||
| const hasFailedBackfillTask = scheduledItems?.some( |
| } | ||
|
|
||
| if (hasFailedBackfillTask || !backfillSchedule || shouldRefetchAllBackfills) { | ||
| if (hasFailedBackfillTask || !scheduledItems || shouldRefetchAllBackfills) { |
There was a problem hiding this comment.
since scheduledItems should always be defined now, should this check scheduledItems.length === 0?
| if ('error' in backfill) { | ||
| continue; | ||
| } | ||
| const scheduledItems = backfill?.schedule.map(toScheduledItem) ?? []; |
There was a problem hiding this comment.
do you need to catch any errors from toScheduledItem here?
There was a problem hiding this comment.
Good catch! I have just added exception handling logic for this 🙏
e1ceacd to
129fb14
Compare
129fb14 to
46048be
Compare
46048be to
0255e7f
Compare
e180efd to
4a21401
Compare
azasypkin
left a comment
There was a problem hiding this comment.
The change LGTM. It looks like you still need to update a few security-related tests to make CI happy, but approving to unblock you.
|
/ci |
💚 Build Succeeded
Metrics [docs]Public APIs missing exports
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
cc @denar50 |
|
Starting backport for target branches: 8.19 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
## Summary These optimizations aim to improve the performance of the bulk actions endpoint. The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a `bulkGet` method has been added to the rules action client. The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below). The following screenshots were taken under these conditions: I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days. Before:  After:  ## How to test? Use [this tool](https://github.com/elastic/security-documents-generator) to create 1 rule (5m) with 1000 gaps. ``` yarn start rules --rules 1 -g 1000 -c -i"5m" ``` Then do a manual run on it for a period of 90 days. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
2 similar comments
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
These optimizations aim to improve the performance of the bulk actions endpoint. The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a `bulkGet` method has been added to the rules action client. The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below). The following screenshots were taken under these conditions: I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days. Before:  After:  Use [this tool](https://github.com/elastic/security-documents-generator) to create 1 rule (5m) with 1000 gaps. ``` yarn start rules --rules 1 -g 1000 -c -i"5m" ``` Then do a manual run on it for a period of 90 days. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 1109f52)
# Backport This will backport the following commits from `main` to `8.19`: - [Optimize bulk actions endpoint & update gaps (#222158)](#222158) <!--- Backport version: 10.0.1 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Edgar Santos","email":"edgar.santos@elastic.co"},"sourceCommit":{"committedDate":"2025-06-13T05:23:18Z","message":"Optimize bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese optimizations aim to improve the performance of the bulk actions\nendpoint.\n\nThe first optimization has to do with how we resolve rules before\nexecuting the bulk actions. Before this PR we were doing this\nsequentially, but a `bulkGet` method has been added to the rules action\nclient.\n\nThe second optimization greatly improves the update gaps routine that\noccurs after a backfill is scheduled (see the before and after\nscreenshots below).\n\nThe following screenshots were taken under these conditions:\nI triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over\na period of 90 days.\n\nBefore:\n\n\n\nAfter:\n\n\n\n\n## How to test?\nUse [this tool](https://github.com/elastic/security-documents-generator)\nto create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g 1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90 days.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4","branchLabelMapping":{"^v9.1.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport missing","Team:Detection Engine","backport:version","v9.1.0","v8.19.0"],"title":"Optimize bulk actions endpoint & update gaps","number":222158,"url":"https://github.com/elastic/kibana/pull/222158","mergeCommit":{"message":"Optimize bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese optimizations aim to improve the performance of the bulk actions\nendpoint.\n\nThe first optimization has to do with how we resolve rules before\nexecuting the bulk actions. Before this PR we were doing this\nsequentially, but a `bulkGet` method has been added to the rules action\nclient.\n\nThe second optimization greatly improves the update gaps routine that\noccurs after a backfill is scheduled (see the before and after\nscreenshots below).\n\nThe following screenshots were taken under these conditions:\nI triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over\na period of 90 days.\n\nBefore:\n\n\n\nAfter:\n\n\n\n\n## How to test?\nUse [this tool](https://github.com/elastic/security-documents-generator)\nto create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g 1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90 days.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4"}},"sourceBranch":"main","suggestedTargetBranches":["8.19"],"targetPullRequestStates":[{"branch":"main","label":"v9.1.0","branchLabelMappingKey":"^v9.1.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/222158","number":222158,"mergeCommit":{"message":"Optimize bulk actions endpoint & update gaps (#222158)\n\n## Summary\nThese optimizations aim to improve the performance of the bulk actions\nendpoint.\n\nThe first optimization has to do with how we resolve rules before\nexecuting the bulk actions. Before this PR we were doing this\nsequentially, but a `bulkGet` method has been added to the rules action\nclient.\n\nThe second optimization greatly improves the update gaps routine that\noccurs after a backfill is scheduled (see the before and after\nscreenshots below).\n\nThe following screenshots were taken under these conditions:\nI triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over\na period of 90 days.\n\nBefore:\n\n\n\nAfter:\n\n\n\n\n## How to test?\nUse [this tool](https://github.com/elastic/security-documents-generator)\nto create 1 rule (5m) with 1000 gaps.\n```\nyarn start rules --rules 1 -g 1000 -c -i\"5m\"\n```\n\nThen do a manual run on it for a period of 90 days.\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"1109f5244cb988dd2c54db8845b2f938319052e4"}},{"branch":"8.19","label":"v8.19.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Edgar Santos <edgar.santos@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Summary
These optimizations aim to improve the performance of the bulk actions endpoint.
The first optimization has to do with how we resolve rules before executing the bulk actions. Before this PR we were doing this sequentially, but a
bulkGetmethod has been added to the rules action client.The second optimization greatly improves the update gaps routine that occurs after a backfill is scheduled (see the before and after screenshots below).
The following screenshots were taken under these conditions:
I triggered a manual run bulk action on 1 rule (5m) with 1000 gaps over a period of 90 days.
Before:

After:

How to test?
Use this tool to create 1 rule (5m) with 1000 gaps.
Then do a manual run on it for a period of 90 days.