Extensible Completion Postings Formats#111494
Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
| Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
| if (mapper instanceof CompletionFieldMapper) { | ||
| return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
@rjernst asked me to clean this up to work similarly to getKnnVectorsFormatForField below. Now, the CompletionFieldMapper is solely responsible for exposing it's own postings format.
There was a problem hiding this comment.
I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912 , and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.
There was a problem hiding this comment.
+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat() into here.
There was a problem hiding this comment.
I suppose I have some confusion as the the purview of each class with respect to who ultimately is the source of truth for the format. To me it seems like the mapper should own it, though I see mixed examples of ownership here. I however agree it's cleaner to just move it here, so I'll do that.
server/src/main/java/org/elasticsearch/index/mapper/CompletionFieldMapper.java
Outdated
Show resolved
Hide resolved
…ent/make-fst-on-heap-configurable
|
Hi @JVerwolf, I've created a changelog YAML for you. |
|
@elasticmachine run elasticsearch-ci/validate-changelogs |
…ent/make-fst-on-heap-configurable
benwtrent
left a comment
There was a problem hiding this comment.
Do we have a rolling upgrade test that will exercise this code ?
We need something that indexes with old format.
Indexes with a mixed cluster and merges some segments.
Indexes with new and merges some segments.
|
Thanks for the review @benwtrent ! Good points, though IIUC, I think this applies more to code that will use this PR (that will provided another codec), rather than to this code, per se. Given that I do have a PR (in a private repo) that uses this, I'll work out how to add rolling upgrade tests there. WDYT? |
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
javanna
left a comment
There was a problem hiding this comment.
This looks good, I left a couple of small comments.
| Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
| if (mapper instanceof CompletionFieldMapper) { | ||
| return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
I find that this is indeed easier to follow. I have one minor suggestion for improvement: would it make sense to move the loading of the codec name etc. also to this class? Given it's just static code, I wonder if we can have it all in a single place. The main reason why I would do so is that we hardcode Competion912 , and I am anxious that changes may happen in Lucene and we may forget to update this string. It is probably easier to miss if it's directly in PerFieldFormatSupplier. We can also make this change later, it's not a big deal. This is a pre-existing problem and the code looks better with your change already.
server/src/main/java/org/elasticsearch/internal/CompletionsPostingsFormatExtension.java
Show resolved
Hide resolved
| org.elasticsearch.serverless.constants; | ||
| org.elasticsearch.serverless.constants, | ||
| org.elasticsearch.serverless.codec, | ||
| org.elasticsearch.stateless; |
There was a problem hiding this comment.
could you help me understand why we need to add org.elasticsearch.serverless.codec here as well and below, and why we need to add org.elasticsearch.stateless here too?
There was a problem hiding this comment.
Yup, org.elasticsearch.serverless.codec and org.elasticsearch.stateless are separate java modules. Each requires access to the CompletionsPostingsFormatExtension found in the org.elasticsearch.internal java module.
@rjernst mind checking this as well to ensure this is correct? Thanks!
There was a problem hiding this comment.
I believe org.elasticsearch.stateless should not be needed here, but codec is so that it can implement the internal spi.
There was a problem hiding this comment.
Thanks @rjernst. I'll remove the export to stateless now that we are no longer using the Feature (for which this was previously required).
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
| org.elasticsearch.serverless.constants; | ||
| org.elasticsearch.serverless.constants, | ||
| org.elasticsearch.serverless.codec, | ||
| org.elasticsearch.stateless; |
There was a problem hiding this comment.
I believe org.elasticsearch.stateless should not be needed here, but codec is so that it can implement the internal spi.
| Mapper mapper = mapperService.mappingLookup().getMapper(field); | ||
| if (mapper instanceof CompletionFieldMapper) { | ||
| return CompletionFieldMapper.postingsFormat(); |
There was a problem hiding this comment.
+1 to following up with moving the contents of CompletionFieldMapper.postingsFormat() into here.
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
…ent/make-fst-on-heap-configurable
|
@elasticmachine run elasticsearch-ci/part-3 |
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See #111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
This PR allows the completion posting format loaded by ES to be extensible.
Lucene uses
PostingsFormat.forName(String)to load a postings format identified by the String arg. This PR provides an extension that allows alternative names to be provided at the point we load postings format in ES at write-time. This name is then written to the shard info. When the shard is read at search-time, Lucene looks for postings formats with the same type on the classpath (as declared though module info via aprovidesdirective), then picks the one with the same name.