Skip to content

[Security Solution] add policy_response_failure defend insight type#231908

Merged
joeypoon merged 10 commits intoelastic:mainfrom
joeypoon:feature/defend-insights-policy-response-failures
Aug 29, 2025
Merged

[Security Solution] add policy_response_failure defend insight type#231908
joeypoon merged 10 commits intoelastic:mainfrom
joeypoon:feature/defend-insights-policy-response-failures

Conversation

@joeypoon
Copy link
Member

@joeypoon joeypoon commented Aug 15, 2025

Summary

Adds a new Defend Insight (AKA. Automatic Troubleshooting) type, policy_response_failure. This Defend Insight type checks the endpoint policy responses for warnings and failures and provides remediation suggestions.

In order to provide better responses for policy response failures, this PR also introduces static KB assets for Defend Insights. policy_response_failure type requests are enriched with relevant KB assets.

The new policy_response_failure Defend Insight type is feature flagged under defendInsightsPolicyResponseFailure.

anonymized_events_retriever and get_anonymized_events directories renamed to events_retriever and get_events due to max path length restriction.

This PR only contains the API changes for this feature.

Corresponding PR to update Security AI Prompt package.

Checklist

@joeypoon joeypoon added backport:skip This PR does not require backporting Team:Defend Workflows “EDR Workflows” sub-team of Security Solution release_note:feature Makes this part of the condensed release notes labels Aug 15, 2025
@joeypoon joeypoon force-pushed the feature/defend-insights-policy-response-failures branch 2 times, most recently from c116eef to a34117f Compare August 15, 2025 10:05
@joeypoon joeypoon marked this pull request as ready for review August 15, 2025 13:31
@joeypoon joeypoon requested review from a team as code owners August 15, 2025 13:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-defend-workflows (Team:Defend Workflows)

* macOS: `sudo /Library/Elastic/Endpoint/elastic-endpoint test output`
* Windows: `C:\\Program Files\\Elastic\\Endpoint\\elastic-endpoint.exe test output`

If network connectivity is the problem and the output doesn't clarify the issue, consider using a tool like curl for further diagnosis. If incorrect proxy information is displayed, review the proxy configuration, noting that Defend advanced options can override these settings. For certificate issues, check the Fleet Server configuration and explore using one of the `advanced.artifacts.user.*` Defend advanced settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include links to online webpages in these snippets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think we can make it work.

Copy link
Contributor

@szwarckonrad szwarckonrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, but otherwise the code looks good to me. Two wishes though 😉:

  1. Could you add a script/tool to hydrate events with the policy response failure ones? It’ll make future development easier so we don’t have to generate them manually each time.
  2. Could you include usage examples - i.e., sample request and response? A quick reference for UI implementation phase
/**
* The suggested remediation for the insight
*/
remediation: z.object({}).catchall(z.unknown()).optional(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Are we sure we cant tighten this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally leaving this more open since we're not sure what future remediation objects might look like.

},
index: number
): string {
return `${event['actions.name'][index]}${splitKey}${event['actions.message'][index]}${splitKey}${event['host.os.name'][0]}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment with an example of a returned value?

Comment on lines +114 to +122
.filter((bucket) => {
const actions = bucket.latest_event.hits.hits[0]._source.Endpoint.policy.applied.actions;
return actions.some((action) => action.status === 'failure' || action.status === 'warning');
})
.map((bucket) => {
const latestPolicyResponse = bucket.latest_event.hits.hits[0];
const failedActions = latestPolicyResponse._source.Endpoint.policy.applied.actions.filter(
(action) => action.status === 'failure' || action.status === 'warning'
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We filter out failure || warning actions in both filter and map, is there a need for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that we don't have nulls in the returned array.

Comment on lines +39 to +50
promptGroupId: promptGroupId.defendInsights.policyResponseFailure,
promptIds: [
promptDictionary.defendInsightsPolicyResponseFailureDefault,
promptDictionary.defendInsightsPolicyResponseFailureRefine,
promptDictionary.defendInsightsPolicyResponseFailureContinue,
promptDictionary.defendInsightsPolicyResponseFailureGroup,
promptDictionary.defendInsightsPolicyResponseFailureEvents,
promptDictionary.defendInsightsPolicyResponseFailureEventsId,
promptDictionary.defendInsightsPolicyResponseFailureEventsEndpointId,
promptDictionary.defendInsightsPolicyResponseFailureEventsValue,
promptDictionary.defendInsightsPolicyResponseFailureRemediation,
promptDictionary.defendInsightsPolicyResponseFailureRemediationMessage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Isnt there a mechanism to fetch all by promptGroupId?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I only see getPrompt and getPromptsByGroupId in the exports and the promptIds arg is required.

return getDefendInsightsIncompatibleVirusGenerationSchema(prompts);
switch (type) {
case DefendInsightType.Enum.incompatible_antivirus:
return getDefendInsightsIncompatibleVirusGenerationSchema(prompts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDefendInsightsIncompatibleVirusGenerationSchema can we stick to AntiVirus? Might be confusing to someone not familiar with this part of Kibana :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, good catch.

.describe(prompts.events),
remediation: z
.object({
message: z.string().describe(prompts.remediationMessage ?? ''),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we expect a message field there. Maybe we can build the schema above step by step and start with message instead of leaving it open-ended for now?

Copy link
Member Author

@joeypoon joeypoon Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misinterpreting your suggestion here but I think you're suggesting that we remove remediation and just have message? I did it this way:

  1. to make it clear it's a remediation message, not just a generic message
  2. to keep the schema for insights more consistent as we might have different remediation shapes in the future
Comment on lines +32 to +33
case DefendInsightType.Enum.policy_response_failure:
return buildPolicyResponseFailureWorkflowInsights(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put it behind a feature flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is flagged at the API level. Agree that we'll want a flag at the UI level as well but that will be in separate PR when we add the API call for this insight type.

@joeypoon
Copy link
Member Author

joeypoon commented Aug 19, 2025

Thanks for taking a look @szwarckonrad 🙇.

  1. Could you add a script/tool to hydrate events with the policy response failure ones? It’ll make future development easier so we don’t have to generate them manually each time.

I believe the scripts/endpoint/resolver_generator script already randomly adds policy response failures. Can generate a handful of endpoints for more failure type coverage.

  1. Could you include usage examples - i.e., sample request and response? A quick reference for UI implementation phase

This is kind of chunky, I'll share with you on slack.

@joeypoon joeypoon force-pushed the feature/defend-insights-policy-response-failures branch 2 times, most recently from 2988783 to 7a46a7d Compare August 19, 2025 11:57
@joeypoon joeypoon force-pushed the feature/defend-insights-policy-response-failures branch from 7a46a7d to d06f7f7 Compare August 19, 2025 14:14
Adds a new Defend Insight type, `policy_response_failure`. This Defend
Insight type checks the endpoint policy responses for warnings and
failures and provides remediation suggestions.
@joeypoon joeypoon force-pushed the feature/defend-insights-policy-response-failures branch from d06f7f7 to c132340 Compare August 20, 2025 12:21
Comment on lines 55 to 62
if (!ctx.licensing.license.hasAtLeast('enterprise')) {
return response.forbidden({
body: {
message:
'Your license does not support Defend Workflows. Please upgrade your license.',
'Your license does not support Automatic Troubleshooting. Please upgrade your license.',
},
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there's a helper utility for performing license, authenticated user and FF checks you can use:

// Perform license, authenticated user and evaluation FF checks
const checkResponse = await performChecks({
capability: 'assistantModelEvaluation',
context: ctx,
request,
response,
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't aware of this, thanks for the tip. since performChecks is explicitly checking the license req for AI assistant, going to leave this as is for now since we want defend insights to maintain a separate license req (even though technically it's the same level as AI assistant right now).

@joeypoon joeypoon requested a review from a team as a code owner August 22, 2025 10:28
Copy link
Contributor

@natasha-moore-elastic natasha-moore-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API doc changes LGTM

Copy link
Member

@spong spong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out, tested KB features locally with FF off, and code reviewed relevant GenAI changes -- LGTM! 👍

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 10.2MB 10.2MB +108.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
elasticAssistant 273.8KB 273.8KB +39.0B
securitySolution 96.0KB 96.1KB +39.0B
total +78.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 677 678 +1

Total ESLint disabled count

id before after diff
securitySolution 777 778 +1

History

@joeypoon joeypoon merged commit f6e2d22 into elastic:main Aug 29, 2025
13 checks passed
ymao1 pushed a commit to ymao1/kibana that referenced this pull request Aug 29, 2025
…lastic#231908)

## Summary

Adds a new Defend Insight (AKA. Automatic Troubleshooting) type,
`policy_response_failure`. This Defend Insight type checks the endpoint
policy responses for warnings and failures and provides remediation
suggestions.

In order to provide better responses for policy response failures, this
PR also introduces static KB assets for Defend Insights.
`policy_response_failure` type requests are enriched with relevant KB
assets.

The new `policy_response_failure` Defend Insight type is feature flagged
under `defendInsightsPolicyResponseFailure`.

`anonymized_events_retriever` and `get_anonymized_events` directories
renamed to `events_retriever` and `get_events` due to max path length
restriction.

This PR only contains the API changes for this feature.

Corresponding [PR](elastic/integrations#14946)
to update Security AI Prompt package.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Garrett Spong <spong@users.noreply.github.com>
- combine duplicate insights into the same 'group' (e.g. AVG + AVG Free + AVG Hub + AVG Antivirus)
- remove insights with no events
`,
CONTINUE: `Continue exactly where you left off in the JSON output below, generating only the additional JSON output when it's required to complete your work. The additional JSON output MUST ALWAYS follow these rules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I'm updating the prompts for something else and I think you may have forgotten to update the integration when you made these changes. My PR will include your changes, so no action needed, but please remember to update the integration in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v9.2.0

8 participants