Skip to content

Attempt to clean up index before remote transfer#115142

Merged
bcully merged 13 commits intoelastic:mainfrom
bcully:recovery-clean-before-transfer
Nov 14, 2024
Merged

Attempt to clean up index before remote transfer#115142
bcully merged 13 commits intoelastic:mainfrom
bcully:recovery-clean-before-transfer

Conversation

@bcully
Copy link
Contributor

@bcully bcully commented Oct 18, 2024

If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We first attempt to call the store's cleanupAndVerify method, which removes anything not referenced by the latest local commit. If that fails, because the local index isn't in a state to produce a local commit, then we fall back to removing temporary files by name.

Closes #104473

@bcully bcully added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. auto-backport Automatically create backport pull requests when merged v9.0.0 v8.17.0 labels Oct 18, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Oct 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @bcully, I've created a changelog YAML for you.

@bcully
Copy link
Contributor Author

bcully commented Oct 18, 2024

I'm doing this as part of onboarding, and don't really have any idea what I'm doing yet. Any kind of feedback is welcome.

bcully and others added 2 commits October 21, 2024 08:45
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We first attempt to call the store's
`cleanupAndVerify` method, which removes anything not referenced by
the latest local commit. If that fails, because the local index isn't
in a state to produce a local commit, then we fall back to removing
temporary files by name.

Closes elastic#104473
@bcully bcully force-pushed the recovery-clean-before-transfer branch from cef215f to 8c6e459 Compare October 21, 2024 15:45
@bcully bcully self-assigned this Oct 21, 2024
@bcully bcully requested a review from idegtiarenko October 22, 2024 15:06
Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments. I think we'll need @DaveCTurner to verify that this high level design is what he had in mind.

Store store = indexShard.store();
store.incRef();
try {
store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wraps some IOExceptions in throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);. Is that something we should be catching and handling?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 catch (IOException ex) {
                    if (existingFile.startsWith(IndexFileNames.SEGMENTS) || existingFile.startsWith(CORRUPTED_MARKER_NAME_PREFIX)) {
                        // TODO do we need to also fail this if we can't delete the pending commit file?
                        // if one of those files can't be deleted we better fail the cleanup otherwise we might leave an old commit
                        // point around?
                        throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);
                    }
                    logger.debug(() -> "failed to delete file [" + existingFile + "]", ex);
                    // ignore, we don't really care, will get deleted later on
                }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that this same method gets called a little later, where it does do some more complicated decision making around exceptions. We could replicate some of that logic here, which could save some transfer work if a recovery is doomed for file system reasons, but my first instinct was to avoid adding new failure modes to recovery, and that replicating or refactoring the handling logic wasn't worth it for this case, which I wouldn't expect to be common enough to need optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had actually misread this comment - I now see your point was that IllegalStateExceptions were escaping here and could fail recovery before we had a chance to pull, which might fix the problem.

I think the errors here (delete failure) are not likely to be repaired by the peer, so we might as well bail out here and save some potentially expensive transfer for a likely doomed recovery attempt. The verification step also throws some IllegalStateExceptions, which I also think are an abnormal condition (cleanup is supposed to reconcile the directory with the provided snapshot) that needs investigation.

I've taken David's suggestion and treated the directory as having an empty snapshot (which will cause cleanup to remove everything), which I think makes cleanup more robust to partially transferred indices (at the cost of potentially retransmitting some files).

I could go either way on that, I don't really have a sense for how much extra work that could create in practice.

final CountDownLatch latch = new CountDownLatch(1);
final AtomicBoolean truncate = new AtomicBoolean(true);
IndicesService unluckyIndices = internalCluster().getInstance(IndicesService.class, unluckyNode);
for (NodeStats dataNode : dataNodeStats) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we can get ride of this and just add the send behavior from primary -> unlucky

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this. I was using this to get the shard path on disk where I could scribble junk files and then check they were removed. It did seem a little indirect - is there a nicer way?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trickier than I expected :) I left a few comments.

store.incRef();
try {
store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata());
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might prefer to restrict this catch just to exceptions from indexShard.snapshotStoreMetadata() rather than from cleanupAndVerify() too. If cleanupAndVerify() fails then that seems like it maybe should be fatal?

Also could we use org.elasticsearch.index.store.Store.MetadataSnapshot#EMPTY to represent the snapshot if we cannot get a proper MetadataSnapshot? Any file re-use during recovery is going to require a valid MetadataSnapshot, so failing here means that the directory contents are of no further use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim made a similar comment, to which I replied above. There is some logic in cleanFiles that inspects the exception generated by the block containing cleanupAndVerify and does a deeper clean in some cases. Looking at it again, I think it's not just an optimization. We'd probably need to replicate that to avoid having to worry about corrupt directories that we can never make forward progress on because we never get a chance to deep clean them.

Is it always true that the files are useless if we can't get a metadata snapshot? I had been thinking that for instance we might have transferred a bunch of data files previously but not the root segment file. In that case, after transferring the segment file we could reuse the data files already transferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always true that the files are useless if we can't get a metadata snapshot?

Yes I think so, we treat a failure as if the store is empty, see how we compute metadataSnapshot in org.elasticsearch.indices.recovery.PeerRecoveryTargetService#getStartRecoveryRequest and then use this data (alone) to compute the collection of files to recover in the implementations of org.elasticsearch.indices.recovery.plan.RecoveryPlannerService#computeRecoveryPlan.

I mean you're right, in principle we could work harder to determine which exact files we do/don't have and avoid a little extra data transfer sometimes. But we don't: this is already on a failure path, and not a particularly well-trodden one, so there's no need for heroic optimizations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a bit more about this and am inclined to use Store.MetadataSnapshot#EMPTY when snapshotStoreMetadata() throws. Interestingly, when I tried it, I found that store.cleanupAndVerify appears to throw on an empty directory, here:

...
Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(ElasticsearchMockDirectoryWrapper(NIOFSDirectory@/Users/brendan/dev/elasticsearch/server/build/testrun/internalClusterTest/temp/org.elasticsearch.recovery.TruncatedRecoveryIT_31340047D7FA1434-001/tempDir-002/node_t4/indices/QuuB7D0wSki7rj5uryRXWQ/4/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@b7a4d9))): files: [write.lock]
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:757)
	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:535)
	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:519)
	at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:116)
	at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:224)
	at org.elasticsearch.index.store.Store$MetadataSnapshot.loadFromIndexCommit(Store.java:822)
	at org.elasticsearch.index.store.Store.getMetadata(Store.java:298)
	at org.elasticsearch.index.store.Store.getMetadata(Store.java:263)
	at org.elasticsearch.index.store.Store.cleanupAndVerify(Store.java:699)
...

Given the name of the variable being assigned here is metadataOrEmpty it seems like it might be a bug, and we just haven't tested against an empty snapshot. If you agree, I'll attempt to fix that and go with this approach of falling back to Empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#116059 adds support for empty stores to Store#cleanupAndVerify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after making cleanupAndVerify handle an empty snapshot, I've taken your suggestion and treated the directory as having an empty snapshot if we fail to load one.

Remove extra guard that we are in phase 2 before checking for garbage.
Take a reference on recoveryTarget around invoking cleanTempFiles.
@bcully
Copy link
Contributor Author

bcully commented Nov 13, 2024

@elasticmachine update branch

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Nov 13, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
bcully added a commit to bcully/elasticsearch that referenced this pull request Nov 14, 2024
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We attempt to load the latest snapshot of the target
directory, which we supply to store's `cleanupAndVerify` method to remove
any files not referenced by it. We treat a failure to load the latest snapshot
as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to
purge the entire target directory and pull from scratch.

Closes elastic#104473
@bcully bcully deleted the recovery-clean-before-transfer branch November 14, 2024 22:17
elasticsearchmachine pushed a commit that referenced this pull request Nov 14, 2024
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We attempt to load the latest snapshot of the target
directory, which we supply to store's `cleanupAndVerify` method to remove
any files not referenced by it. We treat a failure to load the latest snapshot
as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to
purge the entire target directory and pull from scratch.

Closes #104473
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Nov 18, 2024
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We attempt to load the latest snapshot of the target
directory, which we supply to store's `cleanupAndVerify` method to remove
any files not referenced by it. We treat a failure to load the latest snapshot
as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to
purge the entire target directory and pull from scratch.

Closes elastic#104473
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
If a node crashes during recovery, it may leave temporary files behind
that can consume disk space, which may be needed to complete recovery.
So we attempt to clean up the index before transferring files from
a recovery source. We attempt to load the latest snapshot of the target
directory, which we supply to store's `cleanupAndVerify` method to remove
any files not referenced by it. We treat a failure to load the latest snapshot
as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to
purge the entire target directory and pull from scratch.

Closes elastic#104473
lkts added a commit to lkts/elasticsearch that referenced this pull request Dec 15, 2025
lkts added a commit that referenced this pull request Jan 5, 2026
…139569)

* Revert "Attempt to clean up index before remote transfer (#115142)"

This reverts commit fc67f7c.
lkts added a commit to lkts/elasticsearch that referenced this pull request Jan 5, 2026
)" (elastic#139569)

* Revert "Attempt to clean up index before remote transfer (elastic#115142)"

This reverts commit fc67f7c.
lkts added a commit to lkts/elasticsearch that referenced this pull request Jan 5, 2026
)" (elastic#139569)

* Revert "Attempt to clean up index before remote transfer (elastic#115142)"

This reverts commit fc67f7c.
lkts added a commit to lkts/elasticsearch that referenced this pull request Jan 5, 2026
)" (elastic#139569)

* Revert "Attempt to clean up index before remote transfer (elastic#115142)"

This reverts commit fc67f7c.
lkts added a commit to lkts/elasticsearch that referenced this pull request Jan 5, 2026
)" (elastic#139569)

* Revert "Attempt to clean up index before remote transfer (elastic#115142)"

This reverts commit fc67f7c.
elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026
…139569) (#140180)

* Revert "Attempt to clean up index before remote transfer (#115142)"

This reverts commit fc67f7c.
elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026
…139569) (#140179)

* Revert "Attempt to clean up index before remote transfer (#115142)"

This reverts commit fc67f7c.
elasticsearchmachine pushed a commit that referenced this pull request Jan 5, 2026
…139569) (#140181)

* Revert "Attempt to clean up index before remote transfer (#115142)"

This reverts commit fc67f7c.
lkts added a commit that referenced this pull request Jan 5, 2026
…139569) (#140178)

* Revert "Attempt to clean up index before remote transfer (#115142)"

This reverts commit fc67f7c.
@DaveCTurner DaveCTurner removed the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team. v8.17.0 v9.0.0

6 participants