Add Highlighter for Semantic Text Fields#118064
Merged
jimczi merged 21 commits intoelastic:mainfrom Dec 6, 2024
Merged
Conversation
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
Contributor
|
Documentation preview: |
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Collaborator
|
Hi @jimczi, I've created a changelog YAML for you. |
Mikep86
reviewed
Dec 5, 2024
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
Mikep86
reviewed
Dec 5, 2024
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
Mikep86
reviewed
Dec 5, 2024
Contributor
Mikep86
left a comment
There was a problem hiding this comment.
Looks great overall! Could we also add highlighting YAML tests?
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Show resolved
Hide resolved
kderusso
approved these changes
Dec 5, 2024
.../src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
...rence/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java
Outdated
Show resolved
Hide resolved
Contributor
Author
Collaborator
💚 Backport successful
|
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this pull request
Dec 6, 2024
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Dec 6, 2024
* Add Highlighter for Semantic Text Fields (#118064) This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments. * Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
patrykkopycinski
added a commit
to elastic/kibana
that referenced
this pull request
Jan 10, 2025
… of inner_hits (#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from #198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Jan 10, 2025
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" /> (cherry picked from commit 5539000)
patrykkopycinski
added a commit
to elastic/kibana
that referenced
this pull request
Jan 14, 2025
…nstead of inner_hits (#204962) (#206509) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)](#204962) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Patryk Kopyciński","email":"contact@patrykkopycinski.com"},"sourceCommit":{"committedDate":"2025-01-10T15:51:38Z","message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security Assistant","Team:Security Generative AI","backport:version","v8.18.0"],"number":204962,"url":"https://github.com/elastic/kibana/pull/204962","mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/204962","number":204962,"mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
viduni94
pushed a commit
to viduni94/kibana
that referenced
this pull request
Jan 23, 2025
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
|
Is this semantic highlight supoort the dense vectors? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a new highlighter,
semantic, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query.In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.