Allow including semantic field embeddings in _source#134717
Allow including semantic field embeddings in _source#134717jimczi merged 13 commits intoelastic:mainfrom
Conversation
Adds support for returning `_inference_fields` (embeddings for `semantic_text` fields) as part of `_source` when `_source.exclude_vectors` is explicitly set to `false`. This enables use cases like reindexing documents without recomputing embeddings. By default, embeddings remain excluded.
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @jimczi, I've created a changelog YAML for you. |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Mikep86
left a comment
There was a problem hiding this comment.
Looks good overall. I left a few comments about some testing changes. Also, can we add a test where we set _source.exclude_vectors=false using the legacy format and confirm that it has no effect?
modules/reindex/src/main/java/org/elasticsearch/reindex/AbstractAsyncBulkByScrollAction.java
Outdated
Show resolved
Hide resolved
...rence/src/yamlRestTest/resources/rest-api-spec/test/inference/30_semantic_text_inference.yml
Show resolved
Hide resolved
| _source: | ||
| exclude_vectors: false |
There was a problem hiding this comment.
This modifies the existing test to preserve embeddings through reindexing. Can we add a separate test (or scenario to this test) where we don't preserve embeddings and change the inference ID?
There was a problem hiding this comment.
Sure, I added a test in f837041.
Let me know what you think.
Mikep86
left a comment
There was a problem hiding this comment.
New tests look good, thanks for adding them! One thing that wasn't addressed though:
can we add a test where we set
_source.exclude_vectors=falseusing the legacy format and confirm that it has no effect
Mikep86
left a comment
There was a problem hiding this comment.
LGTM, thank you for the test changes!
Adds support for returning `_inference_fields` (embeddings for `semantic_text` fields) as part of `_source` when `_source.exclude_vectors` is explicitly set to `false`. This enables use cases like reindexing documents without recomputing embeddings. By default, embeddings remain excluded.
* upstream/main: (43 commits) Unmute testAckedIndexing to see if it still fails on main (elastic#134682) Silence time zone ID deprecation warning for JDK 25 due to log4j2 bug. (elastic#134719) Adding a getUnmodifiableSourceAndMetadata() method to IngestDocument (elastic#134816) Mark the create-index-from-source action as publicly available on Serverless (elastic#134953) ESQL: Rename command from INLINESTATS to INLINE STATS (elastic#134827) Document multi index query support for simplified retrievers (elastic#134980) [ML] Fix YAMl test to use correct query parameter type (elastic#134999) [Transform] Wait for PIT to close (elastic#134955) Add XPath to XmlUtils (elastic#134923) Fixing conditional processor mutability bugs (elastic#134936) [Transform] Lower loglevel of 3 transform-related error messages from ERROR to WARN (elastic#134985) Unmute pattern text tests. (elastic#134981) Integrate weights into simplified RRF retriever syntax (elastic#132680) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134993 Update periodic java-ea build to test java 26 pre-release (elastic#134983) Mute org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134984 Fix and unmute testIndexSettingProviderPrivateSetting (elastic#134861) Add missing common cat params (elastic#134870) Support querying multiple indices with the simplified RRF retriever (elastic#134822) Allow including semantic field embeddings in _source (elastic#134717) ...
Adds support for returning
_inference_fields(embeddings forsemantic_textfields) as part of_sourcewhen_source.exclude_vectorsis explicitly set tofalse. This enables use cases like reindexing documents without recomputing embeddings. By default, embeddings remain excluded.Docs explaining the new feature.