Skip to content

[Vector Search] BugFix wrong vector docvalue_fields#137862

Merged
iverase merged 8 commits intoelastic:mainfrom
weizijun:fix-dense-vector-cosine
Nov 19, 2025
Merged

[Vector Search] BugFix wrong vector docvalue_fields#137862
iverase merged 8 commits intoelastic:mainfrom
weizijun:fix-dense-vector-cosine

Conversation

@weizijun
Copy link
Contributor

The denormalizedCosineFloatVectorValues class is missing the ordToDoc method, which can lead to incorrect ordering and potentially incorrect values. Reproduce the error:

PUT test_cosine
{
  "mappings": {
    "properties": {
      "foo" : {
        "type" : "keyword"
      },
      "vec" : {
        "type": "dense_vector",
        "similarity": "cosine"
      }
    }
  }
}

PUT test_cosine/_doc/1
{
  "foo" : "bar"
}

PUT test_cosine/_doc/2
{
  "foo" : "bar"
}

PUT test_cosine/_doc/3
{
  "vec" : [1,2]
}

POST test_cosine/_refresh

PUT test_cosine/_doc/4
{
  "foo" : "bar"
}

POST test_cosine/_forcemerge?max_num_segments=1

GET test_cosine/_search
{
  "docvalue_fields": ["vec"]
}

the wrong value is :

"vec": [
  [
    0.4472136,
    0.8944272
  ]
]
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 11, 2025
@weizijun weizijun changed the title fix wrong vector docvalue_fields Nov 11, 2025
@weizijun weizijun changed the title [Vector] Fix wrong vector docvalue_fields Nov 11, 2025
* main: (135 commits)
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=1} elastic#138130
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=2} elastic#138129
  Mute org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT testSearchWithRandomDisconnects elastic#138128
  [DiskBBQ] avoid EsAcceptDocs bug by calling cost before building iterator (elastic#138127)
  Log NOT_PREFERRED shard movements (elastic#138069)
  Improve bulk loading of binary doc values (elastic#137995)
  Add internal action for getting inference fields and inference results for those fields (elastic#137680)
  Address issue with DateFieldMapper#isFieldWithinQuery(...) (elastic#138032)
  WriteLoadConstraintDecider: Have separate rate limiting for canRemain and canAllocate decisions (elastic#138067)
  Adding NodeContext to TransportBroadcastByNodeAction (elastic#138057)
  Mute org.elasticsearch.simdvec.ESVectorUtilTests testSoarDistanceBulk elastic#138117
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#137909
  Backport batched_response_might_include_reduction_failure version to 8.19 (elastic#138046)
  Add summary metrics for tdigest fields (elastic#137982)
  Add gp-llm-v2 model ID and inference endpoint (elastic#138045)
  Various tracing fixes (elastic#137908)
  [ML] Fixing KDE evaluate() to return correct ValueAndMagnitude object (elastic#128602)
  Mute org.elasticsearch.xpack.shutdown.NodeShutdownIT testStalledShardMigrationProperlyDetected elastic#115697
  [ML] Fix Flaky Audit Message Assertion in testWithDatastream for RegressionIT and ClassificationIT (elastic#138065)
  [ML] Fix Non-Deterministic Training Set Selection in RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (elastic#138063)
  ...

# Conflicts:
#	rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/200_dense_vector_docvalue_fields.yml
@weizijun weizijun changed the title [Vector Search] Fix wrong vector docvalue_fields Nov 17, 2025
@weizijun
Copy link
Contributor Author

Hi, @benwtrent , can you help to review this PR?

@szybia szybia added :Search Relevance/Vectors Vector search and removed needs:triage Requires assignment of a team area label labels Nov 17, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@iverase
Copy link
Contributor

iverase commented Nov 17, 2025

@elasticmachine test this please

1 similar comment
@iverase
Copy link
Contributor

iverase commented Nov 17, 2025

@elasticmachine test this please

@iverase
Copy link
Contributor

iverase commented Nov 17, 2025

@elasticmachine test this please

@iverase
Copy link
Contributor

iverase commented Nov 17, 2025

Your new test fails in a mixed cluster setting, we need to skip those situations:

./gradlew ":qa:mixed-cluster:v9.2.2#mixedClusterTest" -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/200_dense_vector_docvalue_fields/Dense vector cosine docvalue_fields with sparse documents after force merge}" -Dtests.seed=2FFDEDC536B94BAE -Dtests.bwc=true -Dtests.locale=no -Dtests.timezone=Asia/Kolkata -Druntime.java=25
@weizijun
Copy link
Contributor Author

Your new test fails in a mixed cluster setting, we need to skip those situations:

./gradlew ":qa:mixed-cluster:v9.2.2#mixedClusterTest" -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/200_dense_vector_docvalue_fields/Dense vector cosine docvalue_fields with sparse documents after force merge}" -Dtests.seed=2FFDEDC536B94BAE -Dtests.bwc=true -Dtests.locale=no -Dtests.timezone=Asia/Kolkata -Druntime.java=25

I fixed it by add a search capability flag.

@iverase
Copy link
Contributor

iverase commented Nov 19, 2025

@elasticmachine test this please

@iverase
Copy link
Contributor

iverase commented Nov 19, 2025

@elasticmachine test this please

@iverase
Copy link
Contributor

iverase commented Nov 19, 2025

@elasticmachine test this please

@benwtrent
Copy link
Member

Great find @weizijun !

Thank you for guiding it through @iverase

@iverase
Copy link
Contributor

iverase commented Nov 19, 2025

I fixed it by add a search capability flag.

@weizijun I hope you don't mind but I have moved it to a node test feature. Search capabilities have an operational cost and in this case we just want to skip some test configurations.

@weizijun
Copy link
Contributor Author

@weizijun I hope you don't mind by I have moved it to a node test feature. Search capabilities have an operational cost and in this case we just want to skip some test configurations.

It's ok, Thank you!

@iverase iverase added the auto-backport Automatically create backport pull requests when merged label Nov 19, 2025
@iverase iverase merged commit 9da387e into elastic:main Nov 19, 2025
35 checks passed
@iverase
Copy link
Contributor

iverase commented Nov 19, 2025

Thanks @weizijun!

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137862

@iverase
Copy link
Contributor

iverase commented Nov 20, 2025

I have backported manually to 9.2 and 9.1. It seems this is not needed in 8.19.

ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
The denormalizedCosineFloatVectorValues class is missing the ordToDoc method, which can lead to incorrect 
ordering and potentially incorrect values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >bug external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.8 v9.2.2 v9.3.0

5 participants