Skip to content

[Agent Builder] Adds Integration Knowledge platform tool#245259

Merged
spong merged 9 commits intoelastic:mainfrom
spong:integration-knowledge-tool
Dec 11, 2025
Merged

[Agent Builder] Adds Integration Knowledge platform tool#245259
spong merged 9 commits intoelastic:mainfrom
spong:integration-knowledge-tool

Conversation

@spong
Copy link
Member

@spong spong commented Dec 4, 2025

Summary

Adds an integration knowledge tool to Agent Builder that retrieves documentation from Fleet-installed integrations using semantic search on the .integration_knowledge index. The tool uses the conditional availability pattern and is only available when the integration knowledge index exists.

Changes

  • Added platform.core.integration_knowledge builtin tool to agent_builder_platform that searches Fleet integration documentation
  • Tool is registered in plugin setup() with conditional availability using the availability configuration pattern
  • Availability is checked at runtime via ES search on .integration_knowledge index (using size: 0 query)
  • Returns structured resource results with package name, version, filename, and content

Technical Details

  • Tool registration added to registerTools() in plugin setup() phase, following the same pattern as productDocumentationTool
  • Uses availability configuration with cacheMode: 'space' to conditionally show/hide the tool based on index availability
  • Searches using Elasticsearch semantic search on the content field
  • esClient.asInternalUser is used for both handler execution and availability checking (index permissions require internal user)
  • Results include reference URLs to integration detail pages (/app/integrations/detail/{package_name})

Considerations


Testing

Note

You must enable the xpack.fleet.enableExperimental: ["installIntegrationsKnowledge"] feature flag until this PR enabling it by default is merged (#245080).

  1. Upload this sample system-2.3.3-NEXT.zip package via Integrations > Create new integration
    • The test package just copies the existing docs/README.md to docs/knowledge_base/README.md so that Fleet ingests it into .integrations_knowledge
  2. Create new Agent with the new Integration Knowledge tool and ask questions related to system integrations, such as:
    • How can I collect CPU and memory data for my windows host?
    • What OS can I run the system integration on?
    • What does the system integration do?
  3. Observe that the responses returned contain relevant information that is cited from the system integration.

PR developed with Cursor + Opus 4.5

@spong spong self-assigned this Dec 4, 2025
@spong spong requested a review from a team as a code owner December 4, 2025 15:23
@spong spong added release_note:enhancement backport:skip This PR does not require backporting Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v9.3.0 labels Dec 4, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just one question about the availability caching

const baseTool: BuiltinToolDefinition<typeof integrationKnowledgeSchema> = {
id: platformCoreTools.integrationKnowledge,
type: ToolType.builtin,
description: `Search and retrieve knowledge from Fleet-installed integrations. This includes information on how to configure and use integrations for data ingestion into Elasticsearch.`,
Copy link
Member

@sorenlouv sorenlouv Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM may not know whether a question relates to a fleet integration without more context.

I've thought about this in other contexts and it would be nice with a dynamic tool description. In this case it would be great if we could do something like:

description: () => {
  const fleetIntegrationNames = await getFleetIntegrationNames();
  return `Search and retrieve knowledge from Fleet-installed integrations: ${fleetIntegrationNames.slice(0, 10).join(',')}. This includes information on how to configure and use integrations for data ingestion into Elasticsearch.`,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we've had enough usage/evals to see how well this tool works, but I agree, this should be a good improvement. Let me see what I can do here 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgayvallet, are you open to adding support for dynamic tool descriptions? And if so, a preferred implementation? Add a new field to the builtin schema for this, or re-type description to be string | function and then call in builtin/converter?

I can ignore this for now and/or just do it with the existing description, so it'd be locked in on startup. Not ideal as integrations get installed/uninstalled, but might improve a little?

Up to you, let me know!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we leave this one alone for now unless Pierre wants to use this as an example for introducing support for dynamic descriptions? This would be a good candidate to test when evals are added. I'm also curious about space-aware fleet packages and how they contribute the .integration_knowledge index and that sorta stuff. We can follow up with the fleet folks here.

package_name: source.package_name,
filename: source.filename,
version: source.version,
content: source.content,
Copy link
Member Author

@spong spong Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's no content trimming, and the current query returns the full content, not highlighted chunks or anything, and seems some of these have a ton of content. 304k tokens for just the System Integration... 😔

This was the behavior of the previous tool (I'm just porting it here), but let me see about trimming/tuning the query to only return relevant chunks. We don't have evals yet for this functionality, so will have to go by feels for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch! That will eat through the context quickly!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, 3b9e229 adds highlighting and we are now returning the 5 most relevant chunks per document, and as a fallback it'll trim to the first 4000 chars if no highlights are available.

Down to 16.7k with this change, and the query response LG(enough)TM 👍 🎉

I'm sure there's plenty to tune here (especially depending on the chunking sizes of the source content), but this at least makes the tool a reasonable context-providing citizen till we/the fleet folks can get some evals in place.

Full tool response JSON

[
  {
    "reference": {
      "url": "/app/integrations/detail/system",
      "title": "system integration (v2.3.3-next) - README.md"
    },
    "partial": true,
    "content": {
      "package_name": "system",
      "filename": "README.md",
      "version": "2.3.3-next",
      "content": "| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | constant_keyword |  |\n| host.containerized | If the host is a container. | boolean |  |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |  |\n| host.os.build | OS build information. | keyword |  |\n| host.os.codename | OS codename, if any. | keyword |  |\n| system.load.1 | Load average for the last minute. | scaled_float | gauge |\n| system.load.15 | Load average for the last 15 minutes. | scaled_float | gauge |\n| system.load.5 | Load average for the last 5 minutes. | scaled_float | gauge |\n| system.load.cores | The number of CPU cores present on the host. | long | gauge |\n| system.load.norm.1 | Load for the last minute divided by the number of cores. | scaled_float | gauge |\n| system.load.norm.15 | Load for the last 15 minutes divided by the number of cores. | scaled_float | gauge |\n| system.load.norm.5 | Load for the last 5 minutes divided by the number of cores. | scaled_float | gauge |\n\n\n### Memory\n\nThe System `memory` data stream provides memory statistics.\n> Note: For retrieving Linux-specific memory metrics, use the [Linux](https://docs.elastic.co/integrations/linux) integration.\n\n#### Supported operating systems\n\n- FreeBSD\n- Linux\n- macOS\n- OpenBSD\n- Windows\n\n#### Permissions\n\nThis data should be available without elevated permissions.\n\n**ECS Field Reference**\n\n\n\n---\n\n| system.cpu.cores | The number of CPU cores present on the host. The non-normalized percentages will have a maximum value of `100% \\* cores`. The normalized percentages already take this value into account and have a maximum value of 100%. | long |  | gauge |\n| system.cpu.idle.norm.pct | The percentage of CPU time spent idle. | scaled_float | percent | gauge |\n| system.cpu.idle.pct | The percentage of CPU time spent idle. | scaled_float | percent | gauge |\n| system.cpu.idle.ticks | The amount of CPU time spent idle. | long |  | counter |\n| system.cpu.iowait.norm.pct | The percentage of CPU time spent in wait (on disk). | scaled_float | percent | gauge |\n| system.cpu.iowait.pct | The percentage of CPU time spent in wait (on disk). | scaled_float | percent | gauge |\n| system.cpu.iowait.ticks | The amount of CPU time spent in wait (on disk). | long |  | counter |\n| system.cpu.irq.norm.pct | The percentage of CPU time spent servicing and handling hardware interrupts. | scaled_float | percent | gauge |\n| system.cpu.irq.pct | The percentage of CPU time spent servicing and handling hardware interrupts. | scaled_float | percent | gauge |\n| system.cpu.irq.ticks | The amount of CPU time spent servicing and handling hardware interrupts. | long |  | counter |\n| system.cpu.nice.norm.pct | The percentage of CPU time spent on low-priority processes. | scaled_float | percent | gauge |\n| system.cpu.nice.pct | The percentage of CPU time spent on low-priority processes. | scaled_float | percent | gauge |\n| system.cpu.nice.ticks | The amount of CPU time spent on low-priority processes. | long |  | counter |\n| system.cpu.softirq.norm.pct | The percentage of CPU time spent servicing and handling software interrupts. | scaled_float | percent | gauge |\n| system.cpu.softirq.pct | The percentage of CPU time spent servicing and handling software interrupts. | scaled_float | percent | gauge |\n| system.cpu.softirq.ticks | The amount of CPU time spent servicing and handling software interrupts. | long |  | counter |\n\n\n---\n\nIf your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | constant_keyword |  |  |\n| host.containerized | If the host is a container. | boolean |  |  |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |  |  |\n| host.os.build | OS build information. | keyword |  |  |\n| host.os.codename | OS codename, if any. | keyword |  |  |\n| system.memory.actual.free | Actual free memory in bytes. It is calculated based on the OS. On Linux this value will be MemAvailable from /proc/meminfo,  or calculated from free memory plus caches and buffers if /proc/meminfo is not available. On OSX it is a sum of free memory and the inactive memory. On Windows, it is equal to `system.memory.free`. | long | byte | gauge |\n| system.memory.actual.used.bytes | Actual used memory in bytes. It represents the difference between the total and the available memory. The available memory depends on the OS. For more details, please check `system.actual.free`. | long | byte | gauge |\n| system.memory.actual.used.pct | The percentage of actual used memory. | scaled_float | percent | gauge |\n| system.memory.free | The total amount of free memory in bytes. This value does not include memory consumed by system caches and buffers (see system.memory.actual.free). | long | byte | gauge |\n| system.memory.swap.free | Available swap memory. | long | byte | gauge |\n| system.memory.swap.total | Total swap memory. | long | byte | gauge |\n| system.memory.swap.used.bytes | Used swap memory. | long | byte | gauge |\n| system.memory.swap.used.pct | The percentage of used swap memory. | scaled_float | percent | gauge |\n\n\n---\n\nIf your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | constant_keyword |  |  |\n| host.containerized | If the host is a container. | boolean |  |  |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |  |  |\n| host.os.build | OS build information. | keyword |  |  |\n| host.os.codename | OS codename, if any. | keyword |  |  |\n| system.core.id | CPU Core number. | keyword |  |  |\n| system.core.idle.pct | The percentage of CPU time spent idle. | scaled_float | percent | gauge |\n| system.core.idle.ticks | The amount of CPU time spent idle. | long |  | counter |\n| system.core.iowait.pct | The percentage of CPU time spent in wait (on disk). | scaled_float | percent | gauge |\n| system.core.iowait.ticks | The amount of CPU time spent in wait (on disk). | long |  | counter |\n| system.core.irq.pct | The percentage of CPU time spent servicing and handling hardware interrupts. | scaled_float | percent | gauge |\n| system.core.irq.ticks | The amount of CPU time spent servicing and handling hardware interrupts. | long |  | counter |\n| system.core.nice.pct | The percentage of CPU time spent on low-priority processes. | scaled_float | percent | gauge |\n| system.core.nice.ticks | The amount of CPU time spent on low-priority processes. | long |  | counter |\n| system.core.softirq.pct | The percentage of CPU time spent servicing and handling software interrupts. | scaled_float | percent | gauge |\n| system.core.softirq.ticks | The amount of CPU time spent servicing and handling software interrupts. | long |  | counter |\n| system.core.steal.pct | The percentage of CPU time spent in involuntary wait by the virtual CPU while the hypervisor was servicing another processor. \n\n---\n\n| system.process.memory.size | The total virtual memory the process has. On Windows this represents the Commit Charge (the total amount of memory that the memory manager has committed for a running process) value in bytes for this process. | long | byte | gauge |\n| system.process.num_threads | Number of threads in the process | integer |  |  |\n| system.process.state | The process state. For example: \"running\". | keyword |  |  |\n\n\n### Process summary\n\nThe `process_summary` data stream collects high level statistics about the running\nprocesses.\n\n#### Supported operating systems\n\n- FreeBSD\n- Linux\n- macOS\n- Windows\n\n#### Permissions\n\nGeneral process summary data should be available without elevated permissions.\nIf the process data belongs to the other users, it will be counted as unknown value.\n\n**ECS Field Reference**\n\nPlease refer to the following [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.\n\n**Exported fields**\n\n| Field | Description | Type | Metric Type |\n|---|---|---|---|\n| @timestamp | Date/time when the event originated. This is the date/time extracted from the event, typically representing when the event was generated by the source. If the event source has no original timestamp, this value is typically populated by the first time the event was received by the pipeline. Required field for all events. | date |  |\n| agent.id | Unique identifier of this agent (if one exists). Example: For Beats this would be beat.id. | keyword |  |\n| cloud.account.id | The cloud account or organization id used to identify different entities in a multi-tenant environment. Examples: AWS account id, Google Cloud ORG Id, or other unique identifier. | keyword |  |\n"
    }
  },
  {
    "reference": {
      "url": "/app/integrations/detail/endpoint",
      "title": "endpoint integration (v9.2.0) - README.md"
    },
    "partial": true,
    "content": {
      "package_name": "endpoint",
      "filename": "README.md",
      "version": "9.2.0",
      "content": "If the OS you're dealing with is not listed as an expected value, the field should not be populated. Please let us know by opening an issue with ECS, to propose its addition. | keyword |\n| host.os.version | Operating system version as a raw string. | keyword |\n| host.type | Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment. | keyword |\n| host.uptime | Seconds the host has been up. | long |\n\n\n### metrics\n\nMetrics documents contain performance information about the endpoint executable and the host it is running on.\n\n#### Exported fields\n\n| Field | Description | Type |\n|---|---|---|\n| @timestamp | Date/time when the event originated. This is the date/time extracted from the event, typically representing when the event was generated by the source. If the event source has no original timestamp, this value is typically populated by the first time the event was received by the pipeline. Required field for all events. | date |\n| Endpoint.metrics | Metrics fields hold the endpoint and system's performance metrics | object |\n| Endpoint.metrics.cpu | CPU statistics | object |\n| Endpoint.metrics.cpu.endpoint | CPU metrics for the endpoint | object |\n| Endpoint.metrics.cpu.endpoint.histogram | This field defines an elasticsearch histogram field (https://www.elastic.co/guide/en/elasticsearch/reference/current/histogram.html#histogram) The values field includes 20 buckets (each bucket is 5%) representing the cpu usage The counts field includes 20 buckets of how many times the endpoint's cpu usage fell into each bucket | histogram |\n| Endpoint.metrics.cpu.endpoint.latest | Average CPU over the last sample interval | half_float |\n\n\n---\n\n| host.domain | Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider. | keyword |\n| host.hostname | Hostname of the host. It normally contains what the `hostname` command returns on the host machine. | keyword |\n| host.id | Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`. | keyword |\n| host.ip | Host ip addresses. | ip |\n| host.mac | Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen. | keyword |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |\n| host.os.Ext | Object for all custom defined fields to live in. | object |\n| host.os.Ext.variant | A string value or phrase that further aid to classify or qualify the operating system (OS).  For example the distribution for a Linux OS will be entered in this field. | keyword |\n| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |\n| host.os.full | Operating system name, including the version or code name. | keyword |\n| host.os.kernel | Operating system kernel version as a raw string. | keyword |\n\n\n---\n\n| host.domain | Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider. | keyword |\n| host.hostname | Hostname of the host. It normally contains what the `hostname` command returns on the host machine. | keyword |\n| host.id | Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`. | keyword |\n| host.ip | Host ip addresses. | ip |\n| host.mac | Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen. | keyword |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |\n| host.os.Ext | Object for all custom defined fields to live in. | object |\n| host.os.Ext.variant | A string value or phrase that further aid to classify or qualify the operating system (OS).  For example the distribution for a Linux OS will be entered in this field. | keyword |\n| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |\n| host.os.full | Operating system name, including the version or code name. | keyword |\n| host.os.kernel | Operating system kernel version as a raw string. | keyword |\n\n\n---\n\n| host.architecture | Operating system architecture. | keyword |\n| host.domain | Name of the domain of which the host is a member. For example, on Windows this could be the host's Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host's LDAP provider. | keyword |\n| host.hostname | Hostname of the host. It normally contains what the `hostname` command returns on the host machine. | keyword |\n| host.id | Unique host id. As hostname is not always unique, use values that are meaningful in your environment. Example: The current usage of `beat.name`. | keyword |\n| host.ip | Host ip addresses. | ip |\n| host.mac | Host MAC addresses. The notation format from RFC 7042 is suggested: Each octet (that is, 8-bit byte) is represented by two [uppercase] hexadecimal digits giving the value of the octet as an unsigned integer. Successive octets are separated by a hyphen. | keyword |\n| host.name | Name of the host. It can contain what hostname returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host. | keyword |\n| host.os.Ext | Object for all custom defined fields to live in. | object |\n| host.os.Ext.variant | A string value or phrase that further aid to classify or qualify the operating system (OS).  For example the distribution for a Linux OS will be entered in this field. | keyword |\n| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |\n| host.os.full | Operating system name, including the version or code name. | keyword |\n| host.os.kernel | Operating system kernel version as a raw string. | keyword |\n\n\n---\n\n| host.os.family | OS family (such as redhat, debian, freebsd, windows). | keyword |\n| host.os.full | Operating system name, including the version or code name. | keyword |\n| host.os.kernel | Operating system kernel version as a raw string. | keyword |\n| host.os.name | Operating system name, without the version. | keyword |\n| host.os.platform | Operating system platform (such centos, ubuntu, windows). | keyword |\n| host.os.type | Use the `os.type` field to categorize the operating system into one of the broad commercial families. If the OS you're dealing with is not listed as an expected value, the field should not be populated. Please let us know by opening an issue with ECS, to propose its addition. | keyword |\n| host.os.version | Operating system version as a raw string. | keyword |\n| host.type | Type of host. For Cloud providers this can be the machine type like `t2.medium`. If vm, this could be the container, for example, or other information meaningful in your environment. | keyword |\n| host.uptime | Seconds the host has been up. | long |\n| message | For log events the message field contains the log message, optimized for viewing in a log viewer. For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event. If multiple messages exist, they can be combined into one message. | match_only_text |\n| process.Ext | Object for all custom defined fields to live in. | object |\n| process.Ext.ancestry | An array of entity_ids indicating the ancestors for this event | keyword |\n| process.Ext.code_signature | Nested version of ECS code_signature fieldset. | nested |\n| process.Ext.code_signature.exists | Boolean to capture if a signature is present. | boolean |\n| process.Ext.code_signature.status | Additional information about the certificate status. "
    }
  },
  {
    "reference": {
      "url": "/app/integrations/detail/synthetics",
      "title": "synthetics integration (v1.4.2) - README.md"
    },
    "partial": true,
    "content": {
      "package_name": "synthetics",
      "filename": "README.md",
      "version": "1.4.2",
      "content": "# Elastic Synthetics\nThe system uses the Elastic Synthetics integration in the background to provide access to [Synthetics private locations](/app/synthetics/settings/private-locations). It is installed by default and you do not need to edit or remove this integration manually.\n\nIf you still have monitors set up with this integration, you have to [migrate them to the Synthetics app](https://www.elastic.co/guide/en/observability/current/synthetics-migrate-from-integration.html).\n\nFor more information on setting up and managing monitors using the new Synthetics app, check the [documentation](https://www.elastic.co/guide/en/observability/current/monitor-uptime-synthetics.html)."
    }
  },
  {
    "reference": {
      "url": "/app/integrations/detail/security_detection_engine",
      "title": "security_detection_engine integration (v9.2.4) - README.md"
    },
    "partial": true,
    "content": {
      "package_name": "security_detection_engine",
      "filename": "README.md",
      "version": "9.2.4",
      "content": "# Prebuilt Security Detection Rules\n\nThe detection rules package stores the prebuilt security rules for the Elastic Security [detection engine](https://www.elastic.co/guide/en/security/7.13/detection-engine-overview.html).\n\nTo download or update the rules, click **Settings** > **Install Prebuilt Security Detection Rules assets**.\nThen [import](https://www.elastic.co/guide/en/security/current/rules-ui-management.html#load-prebuilt-rules)\nthe rules into the Detection engine.\n\n## License Notice\n\n"
    }
  },
  {
    "reference": {
      "url": "/app/integrations/detail/security_ai_prompts",
      "title": "security_ai_prompts integration (v1.0.12) - README.md"
    },
    "partial": true,
    "content": {
      "package_name": "security_ai_prompts",
      "filename": "README.md",
      "version": "1.0.12",
      "content": "# Security AI Prompts Integration (Beta)\n\n## Overview\n\nThe **Security AI Prompts** integration provides pre-configured AI-driven security prompts that enhance automated threat detection and response in Elastic Security. These prompts help security analysts generate AI-assisted insights and streamline their investigative workflows.\n\nThis integration is in **beta** and subject to changes. Feedback and contributions are welcome.\n\n## Requirements\n\n- Elastic Stack **8.19.x**, **9.1.x**, or later.\n- Kibana with the **Elastic Assistant** plugin enabled.\n\n## Installation\n\nThis integration is automatically installed when users visit the **Security Solution** in Kibana. No manual setup is required.\n\n## Usage\n\n1. Navigate to **Security Solution** in Kibana.\n2. AI-generated security prompts will be used in AI Assistant, Attack Discovery, and other security AI features to assist in investigations and threat analysis.\n\n## Known Issues & Limitations\nThis integration is currently in beta and subject to change.\nFuture versions may include automatic prompt synchronization.\n\n## Contributing\nContributions are welcome! If you encounter issues or have suggestions, please open an issue or submit a pull request.\n\n## License\nThis integration is subject to the Elastic License.\n"
    }
  }
]

return {
results: result.documents.map((doc: RetrieveDocumentationResultDoc) => ({
type: ToolResultType.resource,
type: ToolResultType.other,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgayvallet, I noticed there was a type mis-match here as both these tools were ToolResultType.resource but returned url/title, so I changed them to ToolResultType.other. Would you prefer I introduce a new urlResource type instead?

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
onechat 477.3KB 477.4KB +49.0B

History

cc @spong

spong added a commit that referenced this pull request Dec 11, 2025
…ocumentation (#245749)

## Summary

This PR adds a new `Documentation` section to the `GenAI Settings` page
for managing the installation of documentation assets like the [Elastic
Documentation](https://www.elastic.co/docs/reference/kibana/configuration-reference/ai-assistant-settings)
used by the [Product Docs Platform
Tool](#242598) and the Security
Labs content used by the `security.security_labs_search` tool

Design issue: elastic/ai-enhancements#77

<p align="center">
<img width="700"
src="https://github.com/user-attachments/assets/46d7805e-5ef1-4213-8ec9-25875a4f039d"
/>
</p> 


> [!NOTE]
> Security Labs content is not yet served up by the product docs CDN, so
it is currently a disabled placeholder until I work
#244946 next. We can hide this
item for now if that is preferred.


### Implementation notes

* The `product_docs` API's are called directly from the client instead
of plumbing a new AB API. In support of this, the `llm_product_doc`
privilege was added to the `ONECHAT_FEATURE_ID`. If this is not desired,
we can remove this addition and plumb a dedicated API.
* This UI section should be conditionally visible based on if the Agent
Builder experience is enabled/feature is available. ~I need to confirm,
but I believe this is coming in
#244532 This functionality has
been added.
* Client hooks for managing product docs were added to
`/ai_infra/product_doc_base` instead of `gen_ai_settings`. I originally
had them in the setting public code, but figured they made more sense
alongside the product docs. Happy to change is there is preference here.
* As previously discussed, we're only supporting installing ELSER
embeddings at the moment. ~We'll probably want to update the Platform
Docs tool do instruct the model to do query re-writing to english in
support of this.~ This is done in:
#245259
* This uses the same product docs API as the O11y/Security Assistants,
so there is no compatibility issues when switching the AB experience
on/off or using the old assistants.
* RBAC support provided such that documentation management actions are
disabled unless the user has `agentBuilder['all']` Kibana feature
privileges.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [X] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [X] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

_PR developed with Cursor + Opus 4.5_

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@spong spong merged commit 0bb9239 into elastic:main Dec 11, 2025
13 checks passed
@spong spong deleted the integration-knowledge-tool branch December 11, 2025 22:40
seanrathier pushed a commit to seanrathier/kibana that referenced this pull request Dec 15, 2025
…ocumentation (elastic#245749)

## Summary

This PR adds a new `Documentation` section to the `GenAI Settings` page
for managing the installation of documentation assets like the [Elastic
Documentation](https://www.elastic.co/docs/reference/kibana/configuration-reference/ai-assistant-settings)
used by the [Product Docs Platform
Tool](elastic#242598) and the Security
Labs content used by the `security.security_labs_search` tool

Design issue: elastic/ai-enhancements#77

<p align="center">
<img width="700"
src="https://github.com/user-attachments/assets/46d7805e-5ef1-4213-8ec9-25875a4f039d"
/>
</p> 


> [!NOTE]
> Security Labs content is not yet served up by the product docs CDN, so
it is currently a disabled placeholder until I work
elastic#244946 next. We can hide this
item for now if that is preferred.


### Implementation notes

* The `product_docs` API's are called directly from the client instead
of plumbing a new AB API. In support of this, the `llm_product_doc`
privilege was added to the `ONECHAT_FEATURE_ID`. If this is not desired,
we can remove this addition and plumb a dedicated API.
* This UI section should be conditionally visible based on if the Agent
Builder experience is enabled/feature is available. ~I need to confirm,
but I believe this is coming in
elastic#244532 This functionality has
been added.
* Client hooks for managing product docs were added to
`/ai_infra/product_doc_base` instead of `gen_ai_settings`. I originally
had them in the setting public code, but figured they made more sense
alongside the product docs. Happy to change is there is preference here.
* As previously discussed, we're only supporting installing ELSER
embeddings at the moment. ~We'll probably want to update the Platform
Docs tool do instruct the model to do query re-writing to english in
support of this.~ This is done in:
elastic#245259
* This uses the same product docs API as the O11y/Security Assistants,
so there is no compatibility issues when switching the AB experience
on/off or using the old assistants.
* RBAC support provided such that documentation management actions are
disabled unless the user has `agentBuilder['all']` Kibana feature
privileges.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [X] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [X] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [X] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

_PR developed with Cursor + Opus 4.5_

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
seanrathier pushed a commit to seanrathier/kibana that referenced this pull request Dec 15, 2025
)

## Summary

Adds an integration knowledge tool to Agent Builder that retrieves
documentation from Fleet-installed integrations using semantic search on
the `.integration_knowledge` index. The tool uses the conditional
availability pattern and is only available when the integration
knowledge index exists.


<p align="center">
<img width="405"
src="https://github.com/user-attachments/assets/640d4f54-34cc-47e3-b731-b3913139e84e"
/> <img width="395"
src="https://github.com/user-attachments/assets/fd66c044-5536-4947-98d8-45e4b168b34c"
/>
</p> 


## Changes

* Added `platform.core.integration_knowledge` builtin tool to
`agent_builder_platform` that searches Fleet integration documentation
* Tool is registered in plugin `setup()` with conditional availability
using the `availability` configuration pattern
* Availability is checked at runtime via ES search on
`.integration_knowledge` index (using `size: 0` query)
* Returns structured resource results with package name, version,
filename, and content

## Technical Details

* Tool registration added to `registerTools()` in plugin `setup()`
phase, following the same pattern as `productDocumentationTool`
* Uses `availability` configuration with `cacheMode: 'space'` to
conditionally show/hide the tool based on index availability
* Searches using Elasticsearch semantic search on the `content` field
* `esClient.asInternalUser` is used for both handler execution and
availability checking (index permissions require internal user)
* Results include reference URLs to integration detail pages
(`/app/integrations/detail/{package_name}`)

## Considerations

* Tool requires Fleet to have indexed integration knowledge into
`.integration_knowledge`
* Tool availability is checked per-space and cached for performance
* No Kibana restart required - tool appears/disappears dynamically based
on index availability
* This is the onechat/Agent Builder equivalent of the existing
`IntegrationKnowledgeTool` in Security Solution's Assistant
(elastic#236197) and Observability
Solution's Assistant (elastic#237085)
added in `9.2`.

---

## Testing

> [!NOTE]
> You must enable the `xpack.fleet.enableExperimental:
["installIntegrationsKnowledge"]` feature flag until this PR enabling it
by default is merged (elastic#245080).


1. Upload this sample
[system-2.3.3-NEXT.zip](https://github.com/user-attachments/files/22546766/system-2.3.3-NEXT.zip)
package via Integrations > Create new integration
- The test package just copies the existing `docs/README.md` to
`docs/knowledge_base/README.md` so that Fleet ingests it into
`.integrations_knowledge`
2. Create new Agent with the new Integration Knowledge tool and ask
questions related to system integrations, such as:
    - How can I collect CPU and memory data for my windows host?
    - What OS can I run the system integration on?
    - What does the system integration do?
3. Observe that the responses returned contain relevant information that
is cited from the system integration.





_PR developed with Cursor + Opus 4.5_

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:enhancement Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v9.3.0

5 participants