Improve no-op check in PUT _mapping API#138367
Conversation
|
Hi @nielsbauman, I've created a changelog YAML for you. |
|
I did some local benchmarking on a single-node ES cluster (with assertions disabled) where I created 1000 single-index data streams, all with the same (considerably sized) mappings (sourced here). I then ran two different experiments on both 1) No-op updatesI called 1) No-op updates -
|
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
|
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the no-op detection in the PUT _mapping API by performing a more accurate comparison that accounts for default values added or removed during mapping merges. The improved check prevents unnecessary cluster state update tasks when mapping updates would not result in actual changes.
Key changes:
- Implement a pre-merge no-op check in
MetadataMappingService.isWholeRequestNoop()that performs actual mapping merges to detect effective no-ops - Move mapping request processing to the
MANAGEMENTthread pool to handle expensive parsing/merging operations - Remove redundant "dry run" mapping merge that was previously used solely for validation
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| MetadataMappingService.java | Adds isWholeRequestNoop() method with merge-based comparison; refactors applyRequest() to remove duplicate merge and improve efficiency |
| TransportPutMappingAction.java | Changes executor from DIRECT_EXECUTOR_SERVICE to MANAGEMENT thread pool |
| TransportAutoPutMappingAction.java | Changes executor from DIRECT_EXECUTOR_SERVICE to MANAGEMENT thread pool |
| docs/changelog/138367.yaml | Adds changelog entry for the enhancement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataMappingService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataMappingService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataMappingService.java
Outdated
Show resolved
Hide resolved
There's a mappings version and a settings version which you can use to tell whether the mappings or settings have changed. If there's more to it, you could also reasonably keep a hold of the whole
The logging is pretty coarse, but the relevant stats on cluster state update performance are at |
|
Thanks, @DaveCTurner, that's useful. My main worry is that I'm not familiar enough with mappings to know what exactly could influence mappings. Are there other things in the cluster state that could influence mappings, such as cluster settings for instance? Obviously there are things like plugins that could change, but those won't change in between a transport action (on the master node) and the cluster state update task. I'll ask the Search Foundations team if they have an answer to those questions. |
|
Ugh I thought it'd be simple but ofc it isn't. Looking at the code, it needs a FWIW I'm considering alternative ideas around mapping updates, including imposing some kind of throttle on the tasks submitted to the master to avoid cases where thousands of updates take tens of seconds to process and hold everything else up. |
szybia
left a comment
There was a problem hiding this comment.
don't have enough knowledge in this area to approve
but i can't spot anything wrong on a code-level
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataMappingService.java
Show resolved
Hide resolved
dakrone
left a comment
There was a problem hiding this comment.
I think this generally looks good to me, I don't have any concerns, but I'd still feel better if someone from @elastic/es-search-management took a look since they own the mappings merging.
| noop = false; | ||
| break; | ||
| // If the mapping sources are already equal, then we already know this index would be a no-op and can skip further checks. | ||
| if (request.source().equals(mappingMetadata.source())) { |
There was a problem hiding this comment.
Is this continue part here for readability?
Otherwise this could be merrged with the next part?
if (request.source().equals(mappingMetadata.source()) == false) {
try (MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata)) {
}
}There was a problem hiding this comment.
Yep, I usually prefer using continue rather than creating deeper and deeper nesting - for readability indeed. If you have a strong preference for your suggestion, I can incorporate it.
There was a problem hiding this comment.
Fine with me either way, no strong preferences :-) My way of reading this was "if.. break, if...break, if...break, if...continue" that stopped my flow, but your point is valid as well!
| // We could consider caching the result on the request object to avoid doing the same work twice. This would require some | ||
| // checks to ensure the cached result is only used if the circumstances are the same (e.g., no changes to the index settings). | ||
| try (MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata)) { | ||
| mapperService.merge(indexMetadata, MergeReason.MAPPING_RECOVERY); |
There was a problem hiding this comment.
Is the merge reason here on purpose? I saw a code snippet in MapperService.newDocumentMapper() that uses this merge reason to enable/disable limit checks, thus the question.
There was a problem hiding this comment.
That matches what we do here:
We seem to do that for most (if not all) usages of createIndexMapperServiceForValidation. I'm afraid I know nothing of MergeReasons, so I simply copied that approach from existing sources. Let me know if there's a reason we shouldn't use it!
spinscale
left a comment
There was a problem hiding this comment.
I tried to come up with a test case in MetadataMappingServiceTests for the putMapping method, but I am not sure we can check for the isWholeRequestNoop() immediate acknowledgement to not trigger a cluster state update there. Idea would have been to have a keyword field ("field": { "type": "keyword" }) in an index and then for example use the default value of ignore_above as an update ("field": { "type": "keyword", "ignore_above": "2147483647" }). Or testing the isWholeRequestNoop method within the MetadataMappingServiceTests? Do you think that makes sense?
|
We could always create a fresh But testing the |
|
@spinscale what do you think about aa6a984? |
| /** | ||
| * Test that putting a mapping that is semantically identical but syntactically different results in a no-op. | ||
| */ | ||
| public void test() throws IOException { |
There was a problem hiding this comment.
nit: naming.
Does it also make sense to add a test the verifies an interaction? Or is that covered by other tests anyway?
There was a problem hiding this comment.
🤦 missed that one, good catch!
I indeed figured that the non-noop flow is sufficiently covered by other integration tests already (i.e. they verify that non-noop updates actually modify the cluster state).




The existing no-op check in the PUT _mapping API is often ineffective as it only compares the mapping sources directly (with
CompressedXContent.equals()). While that comparison is cheap, it doesn't account for the default values that the mapping service adds or removes when merging mappings. For example,"_data_stream_timestamp":{"enabled":true}is automatically added for data stream indices, and"ignore_malformed":falseis removed if it is supplied in a mapping update request.With this PR, we already perform the merge in the transport action to determine whether the request would be a no-op. By making this no-op check more accurate, we can avoid creating redundant cluster state update tasks. Since large mappings can make the merging of mappings expensive, avoiding cluster state update tasks is valuable to avoid instabilities in other operations of the cluster. We've seen cases of integrations retrying this API repeatedly if it doesn't seem to have the desired effect. While that is a bug in the integrations/client side, by having this improved no-op check, these mapping updates will have a significantly reduced impact on the stability of the cluster (i.e. less cluster state update tasks).
Parsing and merging mappings can be expensive, so we need to perform these checks on the
managementthread pool instead of the transport thread.The existing code merges the mappings twice, where the first time is purely for mapping validation. This doesn't seem to have any value - perhaps it used to have value before. By removing this "dry run" mapping, we can save quite a bit of processing on the precious cluster state update task thread.