Attempt to clean up index before remote transfer by bcully · Pull Request #115142 · elastic/elasticsearch

bcully · 2024-10-18T20:01:38Z

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We first attempt to call the store's cleanupAndVerify method, which removes anything not referenced by the latest local commit. If that fails, because the local index isn't in a state to produce a local commit, then we fall back to removing temporary files by name.

Closes #104473

elasticsearchmachine · 2024-10-18T20:02:02Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2024-10-18T20:02:02Z

Hi @bcully, I've created a changelog YAML for you.

bcully · 2024-10-18T20:25:23Z

I'm doing this as part of onboarding, and don't really have any idea what I'm doing yet. Any kind of feedback is welcome.

server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We first attempt to call the store's `cleanupAndVerify` method, which removes anything not referenced by the latest local commit. If that fails, because the local index isn't in a state to produce a local commit, then we fall back to removing temporary files by name. Closes elastic#104473

…-clean-before-transfer

Tim-Brooks

A couple comments. I think we'll need @DaveCTurner to verify that this high level design is what he had in mind.

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

Tim-Brooks · 2024-10-29T04:06:42Z

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

+                    Store store = indexShard.store();
+                    store.incRef();
+                    try {
+                        store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata());


This wraps some IOExceptions in throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);. Is that something we should be catching and handling?

catch (IOException ex) { if (existingFile.startsWith(IndexFileNames.SEGMENTS) || existingFile.startsWith(CORRUPTED_MARKER_NAME_PREFIX)) { // TODO do we need to also fail this if we can't delete the pending commit file? // if one of those files can't be deleted we better fail the cleanup otherwise we might leave an old commit // point around? throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex); } logger.debug(() -> "failed to delete file [" + existingFile + "]", ex); // ignore, we don't really care, will get deleted later on }

My thinking was that this same method gets called a little later, where it does do some more complicated decision making around exceptions. We could replicate some of that logic here, which could save some transfer work if a recovery is doomed for file system reasons, but my first instinct was to avoid adding new failure modes to recovery, and that replicating or refactoring the handling logic wasn't worth it for this case, which I wouldn't expect to be common enough to need optimization.

I had actually misread this comment - I now see your point was that IllegalStateExceptions were escaping here and could fail recovery before we had a chance to pull, which might fix the problem.

I think the errors here (delete failure) are not likely to be repaired by the peer, so we might as well bail out here and save some potentially expensive transfer for a likely doomed recovery attempt. The verification step also throws some IllegalStateExceptions, which I also think are an abnormal condition (cleanup is supposed to reconcile the directory with the provided snapshot) that needs investigation.

I've taken David's suggestion and treated the directory as having an empty snapshot (which will cause cleanup to remove everything), which I think makes cleanup more robust to partially transferred indices (at the cost of potentially retransmitting some files).

I could go either way on that, I don't really have a sense for how much extra work that could create in practice.

Tim-Brooks · 2024-10-29T04:38:01Z

server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java

        final CountDownLatch latch = new CountDownLatch(1);
        final AtomicBoolean truncate = new AtomicBoolean(true);
+        IndicesService unluckyIndices = internalCluster().getInstance(IndicesService.class, unluckyNode);
        for (NodeStats dataNode : dataNodeStats) {


I'm pretty sure we can get ride of this and just add the send behavior from primary -> unlucky

I'm not following this. I was using this to get the shard path on disk where I could scribble junk files and then check they were removed. It did seem a little indirect - is there a nicer way?

…-clean-before-transfer

DaveCTurner

Trickier than I expected :) I left a few comments.

server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

DaveCTurner · 2024-10-30T16:26:15Z

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

+                    store.incRef();
+                    try {
+                        store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata());
+                    } catch (IOException e) {


I think I might prefer to restrict this catch just to exceptions from indexShard.snapshotStoreMetadata() rather than from cleanupAndVerify() too. If cleanupAndVerify() fails then that seems like it maybe should be fatal?

Also could we use org.elasticsearch.index.store.Store.MetadataSnapshot#EMPTY to represent the snapshot if we cannot get a proper MetadataSnapshot? Any file re-use during recovery is going to require a valid MetadataSnapshot, so failing here means that the directory contents are of no further use.

Tim made a similar comment, to which I replied above. There is some logic in cleanFiles that inspects the exception generated by the block containing cleanupAndVerify and does a deeper clean in some cases. Looking at it again, I think it's not just an optimization. We'd probably need to replicate that to avoid having to worry about corrupt directories that we can never make forward progress on because we never get a chance to deep clean them.

Is it always true that the files are useless if we can't get a metadata snapshot? I had been thinking that for instance we might have transferred a bunch of data files previously but not the root segment file. In that case, after transferring the segment file we could reuse the data files already transferred?

Is it always true that the files are useless if we can't get a metadata snapshot?

Yes I think so, we treat a failure as if the store is empty, see how we compute metadataSnapshot in org.elasticsearch.indices.recovery.PeerRecoveryTargetService#getStartRecoveryRequest and then use this data (alone) to compute the collection of files to recover in the implementations of org.elasticsearch.indices.recovery.plan.RecoveryPlannerService#computeRecoveryPlan.

I mean you're right, in principle we could work harder to determine which exact files we do/don't have and avoid a little extra data transfer sometimes. But we don't: this is already on a failure path, and not a particularly well-trodden one, so there's no need for heroic optimizations.

I thought a bit more about this and am inclined to use Store.MetadataSnapshot#EMPTY when snapshotStoreMetadata() throws. Interestingly, when I tried it, I found that store.cleanupAndVerify appears to throw on an empty directory, here:

... Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(ElasticsearchMockDirectoryWrapper(NIOFSDirectory@/Users/brendan/dev/elasticsearch/server/build/testrun/internalClusterTest/temp/org.elasticsearch.recovery.TruncatedRecoveryIT_31340047D7FA1434-001/tempDir-002/node_t4/indices/QuuB7D0wSki7rj5uryRXWQ/4/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@b7a4d9))): files: [write.lock] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:757) at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:519) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:116) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:224) at org.elasticsearch.index.store.Store$MetadataSnapshot.loadFromIndexCommit(Store.java:822) at org.elasticsearch.index.store.Store.getMetadata(Store.java:298) at org.elasticsearch.index.store.Store.getMetadata(Store.java:263) at org.elasticsearch.index.store.Store.cleanupAndVerify(Store.java:699) ...

Given the name of the variable being assigned here is metadataOrEmpty it seems like it might be a bug, and we just haven't tested against an empty snapshot. If you agree, I'll attempt to fix that and go with this approach of falling back to Empty.

#116059 adds support for empty stores to Store#cleanupAndVerify

after making cleanupAndVerify handle an empty snapshot, I've taken your suggestion and treated the directory as having an empty snapshot if we fail to load one.

server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java

Remove extra guard that we are in phase 2 before checking for garbage. Take a reference on recoveryTarget around invoking cleanTempFiles.

…-clean-before-transfer

…re injection

…-clean-before-transfer

bcully · 2024-11-13T21:11:58Z

@elasticmachine update branch

elasticsearchmachine · 2024-11-13T21:12:25Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Tim-Brooks

LGTM

elasticsearchmachine · 2024-11-14T22:03:59Z

💚 Backport successful

Status	Branch	Result
✅	8.x

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes elastic#104473

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes #104473

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes elastic#104473

)" This reverts commit fc67f7c.

…139569) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

)" (elastic#139569) * Revert "Attempt to clean up index before remote transfer (elastic#115142)" This reverts commit fc67f7c.

…139569) (#140180) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

…139569) (#140179) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

…139569) (#140181) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

…139569) (#140178) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

bcully added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. auto-backport Automatically create backport pull requests when merged v9.0.0 v8.17.0 labels Oct 18, 2024

bcully requested review from DaveCTurner and Tim-Brooks October 18, 2024 20:01

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Oct 18, 2024

idegtiarenko reviewed Oct 21, 2024

View reviewed changes

server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java Outdated Show resolved Hide resolved

bcully and others added 2 commits October 21, 2024 08:45

Update docs/changelog/115142.yaml

8c6e459

bcully force-pushed the recovery-clean-before-transfer branch from cef215f to 8c6e459 Compare October 21, 2024 15:45

bcully self-assigned this Oct 21, 2024

bcully requested a review from idegtiarenko October 22, 2024 15:06

Merge branch 'main' of github.com:elastic/elasticsearch into recovery…

370fdde

…-clean-before-transfer

Tim-Brooks reviewed Oct 29, 2024

View reviewed changes

bcully added 2 commits October 29, 2024 09:05

Merge branch 'main' of github.com:elastic/elasticsearch into recovery…

4517556

…-clean-before-transfer

logger cleanup (PR feedback)

a5a4e28

DaveCTurner reviewed Oct 30, 2024

View reviewed changes

bcully added 3 commits October 30, 2024 10:41

PR feedback

a506fa5

Remove extra guard that we are in phase 2 before checking for garbage. Take a reference on recoveryTarget around invoking cleanTempFiles.

Merge branch 'main' of github.com:elastic/elasticsearch into recovery…

62a92c1

…-clean-before-transfer

Update comment

e426132

bcully requested review from DaveCTurner and Tim-Brooks October 30, 2024 18:21

bcully added 3 commits October 30, 2024 17:13

PR feedback, do not intermingle garbage generation and recovery failu…

a282781

…re injection

Merge branch 'main' of github.com:elastic/elasticsearch into recovery…

be0f205

…-clean-before-transfer

Merge branch 'main' of github.com:elastic/elasticsearch into recovery…

7fc911d

…-clean-before-transfer

Merge branch 'main' into recovery-clean-before-transfer

3fc0a71

elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Nov 13, 2024

Tim-Brooks approved these changes Nov 14, 2024

View reviewed changes

bcully merged commit fc67f7c into elastic:main Nov 14, 2024

bcully mentioned this pull request Nov 14, 2024

[8.x] Attempt to clean up index before remote transfer (#115142) #116854

Merged

bcully deleted the recovery-clean-before-transfer branch November 14, 2024 22:17

arteam mentioned this pull request Dec 24, 2024

[CI] ClusterDisruptionIT testAckedIndexing failing #117024

Closed

lkts mentioned this pull request Dec 5, 2025

[CI] ConcurrentSeqNoVersioningIT testSeqNoCASLinearizability failing #117249

Closed

lkts added a commit to lkts/elasticsearch that referenced this pull request Dec 15, 2025

Revert "Attempt to clean up index before remote transfer (elastic#115142

0cc9649

)" This reverts commit fc67f7c.

lkts added a commit that referenced this pull request Jan 5, 2026

Revert "Attempt to clean up index before remote transfer (#115142)" (#…

4a0d00b

…139569) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026

Revert "Attempt to clean up index before remote transfer (#115142)" (#…

bc4e6dc

…139569) (#140180) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026

Revert "Attempt to clean up index before remote transfer (#115142)" (#…

32ae318

…139569) (#140179) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026

Revert "Attempt to clean up index before remote transfer (#115142)" (#…

9aa0f6a

…139569) (#140181) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

lkts added a commit that referenced this pull request Jan 5, 2026

Revert "Attempt to clean up index before remote transfer (#115142)" (#…

e2bf4fe

…139569) (#140178) * Revert "Attempt to clean up index before remote transfer (#115142)" This reverts commit fc67f7c.

DaveCTurner removed the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to clean up index before remote transfer#115142

Attempt to clean up index before remote transfer#115142
bcully merged 13 commits intoelastic:mainfrom
bcully:recovery-clean-before-transfer

bcully commented Oct 18, 2024

elasticsearchmachine commented Oct 18, 2024

elasticsearchmachine commented Oct 18, 2024

bcully commented Oct 18, 2024

Uh oh!

Tim-Brooks left a comment

Uh oh!

Uh oh!

Tim-Brooks Oct 29, 2024

Tim-Brooks Oct 29, 2024

bcully Oct 29, 2024

bcully Nov 1, 2024

Tim-Brooks Oct 29, 2024

bcully Oct 29, 2024

DaveCTurner left a comment

Uh oh!

Uh oh!

DaveCTurner Oct 30, 2024

bcully Oct 30, 2024

DaveCTurner Oct 30, 2024

bcully Oct 31, 2024

bcully Oct 31, 2024

bcully Nov 1, 2024

Uh oh!

bcully commented Nov 13, 2024

elasticsearchmachine commented Nov 13, 2024

Tim-Brooks left a comment

elasticsearchmachine commented Nov 14, 2024

Labels

6 participants

Conversation

bcully commented Oct 18, 2024

elasticsearchmachine commented Oct 18, 2024

elasticsearchmachine commented Oct 18, 2024

bcully commented Oct 18, 2024

Uh oh!

Tim-Brooks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

bcully commented Nov 13, 2024

elasticsearchmachine commented Nov 13, 2024

Tim-Brooks left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 14, 2024

💚 Backport successful

Labels

6 participants