Skip to content

GPU codec merge fails for vector data exceeding 16GB #141746

@mayya-sharipova

Description

@mayya-sharipova

Elasticsearch Version

9.3

Installed Plugins

No response

Java Version

bundled

OS Version

Linux

Problem Description

During a Lucene segment merge, ES92GpuHnswVectorsWriter fails with a NullPointerException when the vector data file is large enough that MemorySegmentAccessInput.segmentSliceOrNull() returns null.

Root cause

DatasetUtilsImpl.createCuVSMatrix() calls input.segmentSliceOrNull(pos, len) to obtain a contiguous MemorySegment for the entire vector data. This method returns null when the requested range spans across mmap chunk boundaries (chunks are up to 16GB) or exceeds > 16Gb.

 MemorySegment ms = input.segmentSliceOrNull(pos, len);
 assert ms != null; // TODO: this can be null if larger than 16GB or ...

Every other caller of segmentSliceOrNull in the codebase (e.g. FloatVectorScorer, Int7SQVectorScorerSupplier) handles the null case with a fallback, but the GPU codec does not.

Steps to Reproduce

Index enough float vectors into a GPU-accelerated index so that the merged vector data file approaches or exceeds the 16GB mmap chunk boundary.

Logs (if relevant)

  org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Failed to merge GPU index:
  Caused by: java.io.IOException: Failed to merge GPU index:
      at org.elasticsearch.gpu.codec.ES92GpuHnswVectorsWriter.mergeOneField(ES92GpuHnswVectorsWriter.java:533)
  Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.foreign.MemorySegment.byteSize()" because "ms" is null
      at org.elasticsearch.gpu.codec.DatasetUtilsImpl.createCuVSMatrix(DatasetUtilsImpl.java:112)
      at org.elasticsearch.gpu.codec.DatasetUtilsImpl.fromInput(DatasetUtilsImpl.java:74)
      at org.elasticsearch.gpu.codec.ES92GpuHnswVectorsWriter.mergeFloatVectorField(ES92GpuHnswVectorsWriter.java:664)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions