Skip to content

Improve no-op check in PUT _mapping API#138367

Merged
nielsbauman merged 9 commits intoelastic:mainfrom
nielsbauman:optimize-put-mapping
Dec 16, 2025
Merged

Improve no-op check in PUT _mapping API#138367
nielsbauman merged 9 commits intoelastic:mainfrom
nielsbauman:optimize-put-mapping

Conversation

@nielsbauman
Copy link
Contributor

@nielsbauman nielsbauman commented Nov 20, 2025

The existing no-op check in the PUT _mapping API is often ineffective as it only compares the mapping sources directly (with CompressedXContent.equals()). While that comparison is cheap, it doesn't account for the default values that the mapping service adds or removes when merging mappings. For example, "_data_stream_timestamp":{"enabled":true} is automatically added for data stream indices, and "ignore_malformed":false is removed if it is supplied in a mapping update request.

With this PR, we already perform the merge in the transport action to determine whether the request would be a no-op. By making this no-op check more accurate, we can avoid creating redundant cluster state update tasks. Since large mappings can make the merging of mappings expensive, avoiding cluster state update tasks is valuable to avoid instabilities in other operations of the cluster. We've seen cases of integrations retrying this API repeatedly if it doesn't seem to have the desired effect. While that is a bug in the integrations/client side, by having this improved no-op check, these mapping updates will have a significantly reduced impact on the stability of the cluster (i.e. less cluster state update tasks).

Parsing and merging mappings can be expensive, so we need to perform these checks on the management thread pool instead of the transport thread.

The existing code merges the mappings twice, where the first time is purely for mapping validation. This doesn't seem to have any value - perhaps it used to have value before. By removing this "dry run" mapping, we can save quite a bit of processing on the precious cluster state update task thread.

@nielsbauman nielsbauman added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :Data Management/Indices APIs DO NOT USE. Use ":Distributed/Indices APIs" or ":StorageEngine/Templates" instead. labels Nov 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

@nielsbauman
Copy link
Contributor Author

I did some local benchmarking on a single-node ES cluster (with assertions disabled) where I created 1000 single-index data streams, all with the same (considerably sized) mappings (sourced here). I then ran two different experiments on both main and this branch.

1) No-op updates

I called PUT my-ds-*/_mapping repeatedly with the exact same mappings that I used in the data stream's index template.

1) No-op updates - main

This call takes roughly 12 seconds on average and makes my ES node log:

[2025-11-18T16:15:17,388][WARN ][o.e.c.s.MasterService ] [runTask-0] took [23.3s/23355ms] to compute cluster state update for [put-mapping

The flamegraph (collected over 60 seconds):
image
flamegraph-mappings-main-noop-pr.html
The highlighted flames are MapperService#parseMapping calls. Note that all this work is being done on the cluster state update thread. You might be curious why we parse mappings three times. The reason is that we're parsing the new mappings twice, as I explained in this PR description, and that we're parsing the existing mappings of every index when we create the temporary MapperService instances. I think I have a way to avoid that last part, but I'll open a separate PR for that.

1) No-op updates - this PR

This call takes roughly 8 seconds on average and does not result in slow cluster state update logs.
The flamegraph (collected over 60 seconds):
image
flamegraph-mappings-pr-noop-pr.html
We now only parse twice and we do everything in the transport action, in the management threadpool.

2) Actual updates

I called PUT my-ds-*/_mapping repeatedly with slightly different mappings (I incremented ignore_above of one field on every call) to analyze how real (not no-op) updates would perform.

2) Actual updates - main

This call takes roughly 14 seconds on average and also results in slow cluster state update logs.
The flamegraph (collected over 60 seconds):
image
flamegraph-mappings-main-actual-changes-pr.html
The extra highlighted flame on the right is from the IndicesClusterStateService that is processing the actual mapping updates. We can ignore this path in this PR.

2) Actual updates - this PR

This call takes roughly 11 seconds on average and also results in slow cluster state update logs.
The flamegraph (collected over 60 seconds):
image
flamegraph-mappings-pr-actual-changes-pr.html
We can again ignore the rightmost highlighted flame from the IndicesClusterStateService. For some reason, the parsing and merging in the transport action barely show up in this flamegraph. I moved my cursor to that flame/path in the bottom left. I'm not really sure why only the cluster state update path shows up in the graph. Perhaps because that thread pool is single-threaded and the management one is multi-threaded, maybe that can influence how likely it is to show up in the graph?

Anyway, that is pretty much the downside of this PR; we still need to parse and merge all the mappings twice (once in the transport action and once in the cluster state update task). I think it's worth investigating whether we can "cache" the result of the first merge somehow and try to reuse that in the cluster state update task to avoid having to redo the mappings merge. We'd have to come up with a way to ensure that all "dependencies" have not changed (e.g. existing index mappings, index settings, etc.), to ensure we can actually reuse the merge from the transport action. In my opinion, this current PR is valuable on its own, so I decided to leave that as future work.

@nielsbauman nielsbauman marked this pull request as ready for review November 20, 2025 14:59
@elasticsearchmachine elasticsearchmachine added Team:Data Management (obsolete) DO NOT USE. This team no longer exists. Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Nov 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@lukewhiting lukewhiting requested a review from Copilot November 20, 2025 15:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the no-op detection in the PUT _mapping API by performing a more accurate comparison that accounts for default values added or removed during mapping merges. The improved check prevents unnecessary cluster state update tasks when mapping updates would not result in actual changes.

Key changes:

  • Implement a pre-merge no-op check in MetadataMappingService.isWholeRequestNoop() that performs actual mapping merges to detect effective no-ops
  • Move mapping request processing to the MANAGEMENT thread pool to handle expensive parsing/merging operations
  • Remove redundant "dry run" mapping merge that was previously used solely for validation

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
MetadataMappingService.java Adds isWholeRequestNoop() method with merge-based comparison; refactors applyRequest() to remove duplicate merge and improve efficiency
TransportPutMappingAction.java Changes executor from DIRECT_EXECUTOR_SERVICE to MANAGEMENT thread pool
TransportAutoPutMappingAction.java Changes executor from DIRECT_EXECUTOR_SERVICE to MANAGEMENT thread pool
docs/changelog/138367.yaml Adds changelog entry for the enhancement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DaveCTurner
Copy link
Contributor

We'd have to come up with a way to ensure that all "dependencies" have not changed

There's a mappings version and a settings version which you can use to tell whether the mappings or settings have changed. If there's more to it, you could also reasonably keep a hold of the whole IndexMetadata and check it for instance equality in the cluster state update.

does not result in slow cluster state update logs

The logging is pretty coarse, but the relevant stats on cluster state update performance are at GET _nodes/_master/stats/discovery?filter_path=nodes.*.discovery.cluster_state_update.success&pretty. Does that help?

@nielsbauman
Copy link
Contributor Author

Thanks, @DaveCTurner, that's useful. My main worry is that I'm not familiar enough with mappings to know what exactly could influence mappings. Are there other things in the cluster state that could influence mappings, such as cluster settings for instance? Obviously there are things like plugins that could change, but those won't change in between a transport action (on the master node) and the cluster state update task. I'll ask the Search Foundations team if they have an answer to those questions.

@DaveCTurner
Copy link
Contributor

Ugh I thought it'd be simple but ofc it isn't. Looking at the code, it needs a MapperService, which depends on most of IndicesService, including ClusterService, so without going through the code in great detail it'll be pretty hard to answer. And future-proofing it will also be a concern (for that we could at least add some assertions to verify that the mapping we cache matches a freshly-generated mapping if assertions are enabled). Hopefully the Search Foundations folks can give a more confident answer

FWIW I'm considering alternative ideas around mapping updates, including imposing some kind of throttle on the tasks submitted to the master to avoid cases where thousands of updates take tens of seconds to process and hold everything else up.

Copy link
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have enough knowledge in this area to approve

but i can't spot anything wrong on a code-level

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good to me, I don't have any concerns, but I'd still feel better if someone from @elastic/es-search-management took a look since they own the mappings merging.

noop = false;
break;
// If the mapping sources are already equal, then we already know this index would be a no-op and can skip further checks.
if (request.source().equals(mappingMetadata.source())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this continue part here for readability?

Otherwise this could be merrged with the next part?

if (request.source().equals(mappingMetadata.source()) == false) {
    try (MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata)) {
    }
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I usually prefer using continue rather than creating deeper and deeper nesting - for readability indeed. If you have a strong preference for your suggestion, I can incorporate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me either way, no strong preferences :-) My way of reading this was "if.. break, if...break, if...break, if...continue" that stopped my flow, but your point is valid as well!

// We could consider caching the result on the request object to avoid doing the same work twice. This would require some
// checks to ensure the cached result is only used if the circumstances are the same (e.g., no changes to the index settings).
try (MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata)) {
mapperService.merge(indexMetadata, MergeReason.MAPPING_RECOVERY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the merge reason here on purpose? I saw a code snippet in MapperService.newDocumentMapper() that uses this merge reason to enable/disable limit checks, thus the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That matches what we do here:

MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata);
indexMapperServices.put(index, mapperService);
// add mappings for all types, we need them for cross-type validation
mapperService.merge(indexMetadata, MergeReason.MAPPING_RECOVERY);

We seem to do that for most (if not all) usages of createIndexMapperServiceForValidation. I'm afraid I know nothing of MergeReasons, so I simply copied that approach from existing sources. Let me know if there's a reason we shouldn't use it!

Copy link
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to come up with a test case in MetadataMappingServiceTests for the putMapping method, but I am not sure we can check for the isWholeRequestNoop() immediate acknowledgement to not trigger a cluster state update there. Idea would have been to have a keyword field ("field": { "type": "keyword" }) in an index and then for example use the default value of ignore_above as an update ("field": { "type": "keyword", "ignore_above": "2147483647" }). Or testing the isWholeRequestNoop method within the MetadataMappingServiceTests? Do you think that makes sense?

@nielsbauman
Copy link
Contributor Author

We could always create a fresh MetadataMappingService instance with a mocked ClusterService and mocked/custom task queue. MetadataMappingService doesn't depend on many things so it's straightforward to create one. Then we verify that nothing is added to the queue.

But testing the isWholeRequestNoop method diretly is definitely also an option indeed. I'll play around with both and see which one seems best.

@nielsbauman
Copy link
Contributor Author

@spinscale what do you think about aa6a984?

/**
* Test that putting a mapping that is semantically identical but syntactically different results in a no-op.
*/
public void test() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming.

Does it also make sense to add a test the verifies an interaction? Or is that covered by other tests anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 missed that one, good catch!

I indeed figured that the non-noop flow is sufficiently covered by other integration tests already (i.e. they verify that non-noop updates actually modify the cluster state).

@nielsbauman nielsbauman enabled auto-merge (squash) December 11, 2025 13:47
@nielsbauman nielsbauman disabled auto-merge December 11, 2025 14:53
@nielsbauman nielsbauman enabled auto-merge (squash) December 16, 2025 10:10
@nielsbauman nielsbauman merged commit 59ef0c5 into elastic:main Dec 16, 2025
35 checks passed
@nielsbauman nielsbauman deleted the optimize-put-mapping branch December 16, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Indices APIs DO NOT USE. Use ":Distributed/Indices APIs" or ":StorageEngine/Templates" instead. >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Data Management (obsolete) DO NOT USE. This team no longer exists. Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.3.0

8 participants