Use existing DocumentMapper when creating new MapperService#138489
Use existing DocumentMapper when creating new MapperService#138489nielsbauman merged 7 commits intoelastic:mainfrom
DocumentMapper when creating new MapperService#138489Conversation
When we create a new `MapperService` from an existing index for validation purposes, we often merge the mappings of the existing index into the temporary mapper service right after. This requires parsing the mappings and "merging" them (there's not much merging to do with an empty service, but we still need to traverse the mappings object). This can take up considerable time for larger mappings. By passing the existing `DocumentMapper` instance of the index (if available) to the new `MapperService`, the aforementioned mapping merge becomes a no-op and saves all that redundant processing.
|
Hi @nielsbauman, I've created a changelog YAML for you. |
|
These changes are related to #138367, but can be merged/viewed separately. I performed similar benchmarks for this PR as well. See #138367 (comment) for a brief explanation of what experiments I performed. Here is the same flamegraph on And here is the flamegraph on this PR: The rightmost merge/parse path completely disappeared, saving a lot of processing. |
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
| .map(IndexService::mapperService) | ||
| .map(MapperService::documentMapper) | ||
| .orElse(null); | ||
| return indexModule.newIndexMapperService(clusterService, parserConfig, mapperRegistry, scriptService, documentMapper); |
There was a problem hiding this comment.
Thinking how this could be tested, I played around a little but my IDE seems not to like me too much today. Would a test like this (in IndicesServiceTests make sense to you?
public void testCheckIndicesServiceMapper() throws IOException {
IndicesService indicesService = getIndicesService();
IndexMetadata indexMetadata = new IndexMetadata.Builder("test").settings(
indexSettings(1, 1).put("index.version.created", IndexVersions.V_8_10_0)
.put(SETTING_INDEX_UUID, randomUUID())
).build();
IndexService indexService = indicesService.createIndex(indexMetadata, List.of(), true);
IndexMetadata newIndexMetadata = IndexMetadata.builder(indexMetadata)
.numberOfReplicas(2)
.mappingVersion(indexMetadata.getMappingVersion()+1)
.putMapping("""
{
"_doc":{
"properties": {
"@timestamp": {
"type": "date"
}
}
}
}""")
.build();
indexService.updateMapping(indexMetadata, newIndexMetadata);
DocumentMapper documentMapper = indicesService.createIndexMapperServiceForValidation(indexMetadata).documentMapper();
DocumentMapper cachedDocumentMapper = indicesService.createIndexMapperServiceForValidation(indexMetadata).documentMapper();
assertNotNull(documentMapper);
assertSame(documentMapper, cachedDocumentMapper);
}There was a problem hiding this comment.
Thanks for the suggestion! I pushed b91c6a6 with some minor changes to your suggestion. Let me know if that works for you.
server/src/test/java/org/elasticsearch/indices/IndicesServiceTests.java
Outdated
Show resolved
Hide resolved
|
Thanks a lot for the review, @spinscale! This will help a great deal in reducing the performance impact of mapping updates. |
We observed a race condition - introduced in elastic#138489 - for usages of `IndicesService#createIndexMapperServiceForValidation` between `ClusterService#state()` and the `DocumentMapper` in an index' `MapperService`, as both are updated non-atomically. This race condition manifested in the `PUT _mapping` API in the form of mapping conflicts because we merge the mapping of the `IndexMetadata` (retrieved from the `ClusterService`) into the temporary `MapperService` for validation (which uses the mappings from the existing `MapperService`). Since reusing the `DocumentMapper` is merely an optimization, we can simply skip it in these rare situations where the race condition manifests.
We observed a race condition - introduced in #138489 - for usages of `IndicesService#createIndexMapperServiceForValidation` between `ClusterService#state()` and the `DocumentMapper` in an index' `MapperService`, as both are updated non-atomically. This race condition manifested in the `PUT _mapping` API in the form of mapping conflicts because we merge the mapping of the `IndexMetadata` (retrieved from the `ClusterService`) into the temporary `MapperService` for validation (which uses the mappings from the existing `MapperService`). Since reusing the `DocumentMapper` is merely an optimization, we can simply skip it in these rare situations where the race condition manifests.


When we create a new
MapperServicefrom an existing index for validation purposes, we often merge the mappings of the existing index into the temporary mapper service right after. This requires parsing the mappings and "merging" them (there's not much merging to do with an empty service, but we still need to traverse the mappings object). This can take up considerable time for larger mappings.By passing the existing
DocumentMapperinstance of the index (if available) to the newMapperService, the aforementioned mapping merge becomes a no-op and saves all that redundant processing.