Skip to content

Allow including semantic field embeddings in _source#134717

Merged
jimczi merged 13 commits intoelastic:mainfrom
jimczi:exclude_inference_fields
Sep 18, 2025
Merged

Allow including semantic field embeddings in _source#134717
jimczi merged 13 commits intoelastic:mainfrom
jimczi:exclude_inference_fields

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Sep 15, 2025

Adds support for returning _inference_fields (embeddings for semantic_text fields) as part of _source when _source.exclude_vectors is explicitly set to false. This enables use cases like reindexing documents without recomputing embeddings. By default, embeddings remain excluded.

Docs explaining the new feature.

Adds support for returning `_inference_fields` (embeddings for `semantic_text` fields)
as part of `_source` when `_source.exclude_vectors` is explicitly set to `false`.
This enables use cases like reindexing documents without recomputing embeddings.
By default, embeddings remain excluded.
@jimczi jimczi requested a review from Mikep86 September 15, 2025 11:13
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 15, 2025

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I left a few comments about some testing changes. Also, can we add a test where we set _source.exclude_vectors=false using the legacy format and confirm that it has no effect?

Comment on lines +368 to +369
_source:
exclude_vectors: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modifies the existing test to preserve embeddings through reindexing. Can we add a separate test (or scenario to this test) where we don't preserve embeddings and change the inference ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a test in f837041.
Let me know what you think.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tests look good, thanks for adding them! One thing that wasn't addressed though:

can we add a test where we set _source.exclude_vectors=false using the legacy format and confirm that it has no effect

@jimczi
Copy link
Contributor Author

jimczi commented Sep 17, 2025

@Mikep86 I added a test for the old format in 6bc776b, let me know what you think.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the test changes!

@jimczi jimczi merged commit 8933c15 into elastic:main Sep 18, 2025
34 checks passed
@jimczi jimczi deleted the exclude_inference_fields branch September 18, 2025 07:41
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
Adds support for returning `_inference_fields` (embeddings for `semantic_text` fields)
as part of `_source` when `_source.exclude_vectors` is explicitly set to `false`.
This enables use cases like reindexing documents without recomputing embeddings.
By default, embeddings remain excluded.
szybia added a commit to szybia/elasticsearch that referenced this pull request Sep 18, 2025
* upstream/main: (43 commits)
  Unmute testAckedIndexing to see if it still fails on main (elastic#134682)
  Silence time zone ID deprecation warning for JDK 25 due to log4j2 bug. (elastic#134719)
  Adding a getUnmodifiableSourceAndMetadata() method to IngestDocument (elastic#134816)
  Mark the create-index-from-source action as publicly available on Serverless (elastic#134953)
  ESQL: Rename command from INLINESTATS to INLINE STATS (elastic#134827)
  Document multi index query support for simplified retrievers (elastic#134980)
  [ML] Fix YAMl test to use correct query parameter type (elastic#134999)
  [Transform] Wait for PIT to close (elastic#134955)
  Add XPath to XmlUtils (elastic#134923)
  Fixing conditional processor mutability bugs (elastic#134936)
  [Transform] Lower loglevel of 3 transform-related error messages from ERROR to WARN (elastic#134985)
  Unmute pattern text tests. (elastic#134981)
  Integrate weights into simplified RRF retriever syntax (elastic#132680)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134993
  Update periodic java-ea build to test java 26 pre-release (elastic#134983)
  Mute org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134984
  Fix and unmute testIndexSettingProviderPrivateSetting (elastic#134861)
  Add missing common cat params (elastic#134870)
  Support querying multiple indices with the simplified RRF retriever (elastic#134822)
  Allow including semantic field embeddings in _source (elastic#134717)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

3 participants