Update sparse_vector field mapping to include default setting for token pruning#129089
Merged
markjhoy merged 64 commits intoelastic:mainfrom Jun 23, 2025
Merged
Conversation
Collaborator
|
Hi @markjhoy, I've created a changelog YAML for you. |
Contributor
Author
|
Note - due to the scope of the changes (and especially for the transport and index versions) - this will require a manual backport to 8.19 |
Collaborator
|
Pinging @elastic/search-eng (Team:SearchOrg) |
Collaborator
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
Collaborator
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Contributor
Author
|
buildkite test this |
Collaborator
💔 Backport failed
You can use sqren/backport to manually backport by running |
markjhoy
added a commit
to markjhoy/elasticsearch
that referenced
this pull request
Jun 23, 2025
…en pruning (elastic#129089) * Initial checkin of refactored index_options code * [CI] Auto commit changes from spotless * initial unit testing * complete unit tests; add yaml tests * [CI] Auto commit changes from spotless * register test feature for sparse vector * Update docs/changelog/129089.yaml * update changelog * add docs * explicit set default index_options if null * [CI] Auto commit changes from spotless * update yaml tests; update docs * fix yaml tests * readd auth for teardown * only serialize index options if not default * [CI] Auto commit changes from spotless * serialization refactor; pass index version around * [CI] Auto commit changes from spotless * fix transport versions merge * fix up docs * [CI] Auto commit changes from spotless * fix docs; add include_defaults unit and yaml test * [CI] Auto commit changes from spotless * override getIndexReaderManager for SemanticQueryBuilderTests * [CI] Auto commit changes from spotless * cleanup mapper/builder/tests; index vers. in type still need to refactor / clean YAML tests * [CI] Auto commit changes from spotless * cleanups to mapper tests for clarity * [CI] Auto commit changes from spotless * move feature into mappers; fix yaml tests * cleanups; add comments; remove redundant test * [CI] Auto commit changes from spotless * escape more periods in the YAML tests * cleanup mapper and type tests * [CI] Auto commit changes from spotless * rename mapping for previous index test * set explicit number of shards for yaml test --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co> (cherry picked from commit a671505) # Conflicts: # docs/reference/elasticsearch/mapping-reference/sparse-vector.md # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/index/IndexVersions.java # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java # server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderTests.java
Contributor
Author
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
markjhoy
added a commit
that referenced
this pull request
Jun 24, 2025
…for token pruning (#129089) (#129890) * Update sparse_vector field mapping to include default setting for token pruning (#129089) * Initial checkin of refactored index_options code * [CI] Auto commit changes from spotless * initial unit testing * complete unit tests; add yaml tests * [CI] Auto commit changes from spotless * register test feature for sparse vector * Update docs/changelog/129089.yaml * update changelog * add docs * explicit set default index_options if null * [CI] Auto commit changes from spotless * update yaml tests; update docs * fix yaml tests * readd auth for teardown * only serialize index options if not default * [CI] Auto commit changes from spotless * serialization refactor; pass index version around * [CI] Auto commit changes from spotless * fix transport versions merge * fix up docs * [CI] Auto commit changes from spotless * fix docs; add include_defaults unit and yaml test * [CI] Auto commit changes from spotless * override getIndexReaderManager for SemanticQueryBuilderTests * [CI] Auto commit changes from spotless * cleanup mapper/builder/tests; index vers. in type still need to refactor / clean YAML tests * [CI] Auto commit changes from spotless * cleanups to mapper tests for clarity * [CI] Auto commit changes from spotless * move feature into mappers; fix yaml tests * cleanups; add comments; remove redundant test * [CI] Auto commit changes from spotless * escape more periods in the YAML tests * cleanup mapper and type tests * [CI] Auto commit changes from spotless * rename mapping for previous index test * set explicit number of shards for yaml test --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co> (cherry picked from commit a671505) # Conflicts: # docs/reference/elasticsearch/mapping-reference/sparse-vector.md # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/index/IndexVersions.java # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java # server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderTests.java * Update changelog for version * [CI] Auto commit changes from spotless * Update docs to replace 9.1 with 8.19 * Rename 129089.yaml to 129890.yaml * proper asciidocs; cleanups * remove doc preview labels; cleanup test index ver. * clean up docs * add sparse vector token pruning tag --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
mridula-s109
pushed a commit
to mridula-s109/elasticsearch
that referenced
this pull request
Jun 25, 2025
…en pruning (elastic#129089) * Initial checkin of refactored index_options code * [CI] Auto commit changes from spotless * initial unit testing * complete unit tests; add yaml tests * [CI] Auto commit changes from spotless * register test feature for sparse vector * Update docs/changelog/129089.yaml * update changelog * add docs * explicit set default index_options if null * [CI] Auto commit changes from spotless * update yaml tests; update docs * fix yaml tests * readd auth for teardown * only serialize index options if not default * [CI] Auto commit changes from spotless * serialization refactor; pass index version around * [CI] Auto commit changes from spotless * fix transport versions merge * fix up docs * [CI] Auto commit changes from spotless * fix docs; add include_defaults unit and yaml test * [CI] Auto commit changes from spotless * override getIndexReaderManager for SemanticQueryBuilderTests * [CI] Auto commit changes from spotless * cleanup mapper/builder/tests; index vers. in type still need to refactor / clean YAML tests * [CI] Auto commit changes from spotless * cleanups to mapper tests for clarity * [CI] Auto commit changes from spotless * move feature into mappers; fix yaml tests * cleanups; add comments; remove redundant test * [CI] Auto commit changes from spotless * escape more periods in the YAML tests * cleanup mapper and type tests * [CI] Auto commit changes from spotless * rename mapping for previous index test * set explicit number of shards for yaml test --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates the
SparseVectorFieldMappertype to include index options for pruning tokens and associated configuration values.Before this update, token pruning for sparse vector types is only available via the query (see parameters for the sparse vector query ).
With this PR, by default, any new indices with a
sparse_vectorfield type will by default have token pruning turned on (previous indices that may have hadsparse_vectorfields that exist before this update will still keep pruning turned off by default). Anysparse_vectorqueries that have explicit pruning options will still override the index defaults if they are set up.Example: