Implement v_hamming#132959
Conversation
🔍 Preview links for changed docs |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @svilen-mihaylov-elastic, I've created a changelog YAML for you. |
| } | ||
|
|
||
| public static float calculateSimilarity(float[] leftScratch, float[] rightScratch) { | ||
| byte[] a = new byte[leftScratch.length]; |
There was a problem hiding this comment.
Core of change. We assume here the floats as in range (0, 256), convert to byte vectors, and do the same as in ES815BitFlatVectorsFormat
There was a problem hiding this comment.
Per feedback, returning raw distance, not normalized between 0.0 and 1.0 as above.
leemthompo
left a comment
There was a problem hiding this comment.
Just a couple docs things: clearer release note text + need to fix applies_to :)
...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Hamming.java
Outdated
Show resolved
Hide resolved
| @Param( | ||
| name = "left", | ||
| type = { "dense_vector" }, | ||
| description = "first dense_vector to calculate hamming distance between" |
There was a problem hiding this comment.
should be "first dense_vector to calculate Hamming distance"
the same for the other param.
There was a problem hiding this comment.
Perhaps we could be even more concise here, considering the context that the inputs are for the Hamming distance calculation is already clear?
For example:
First input vector
Second input vector
Since the type already specifies dense_vector, maybe the param description doesn't need to repeat it either.
There was a problem hiding this comment.
I can update to the shortened version.
There was a problem hiding this comment.
@leemthompo Following a quick conversation with @ioanatia, it's probably better to update the descriptions of this and other similarity functions together for consistency's sake. Have you had a chance to look at for example L1_Norm and L2_Norm? This CR is simply following the same pattern. If that's fine with you, we can start a separate CR with your proposed changes across all similarity functions. How does this sound?
There was a problem hiding this comment.
| @FunctionInfo( | ||
| returnType = "double", | ||
| preview = true, | ||
| description = "Calculates the hamming distance between two dense_vectors.", |
There was a problem hiding this comment.
nit, but this should be Hamming
| for (int i = 0; i < leftScratch.length; i++) { | ||
| b[i] = (byte) rightScratch[i]; | ||
| } | ||
| return ((a.length * Byte.SIZE) - VectorUtil.xorBitCount(a, b)) / (float) (a.length * Byte.SIZE); |
There was a problem hiding this comment.
we need to return just VectorUtil.xorBitCount(a, b)
...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Hamming.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…expression/function/vector/Hamming.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…expression/function/vector/Hamming.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Implements #132056