Attempt to clean up index before remote transfer#115142
Attempt to clean up index before remote transfer#115142bcully merged 13 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @bcully, I've created a changelog YAML for you. |
|
I'm doing this as part of onboarding, and don't really have any idea what I'm doing yet. Any kind of feedback is welcome. |
server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java
Outdated
Show resolved
Hide resolved
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We first attempt to call the store's `cleanupAndVerify` method, which removes anything not referenced by the latest local commit. If that fails, because the local index isn't in a state to produce a local commit, then we fall back to removing temporary files by name. Closes elastic#104473
cef215f to
8c6e459
Compare
…-clean-before-transfer
Tim-Brooks
left a comment
There was a problem hiding this comment.
A couple comments. I think we'll need @DaveCTurner to verify that this high level design is what he had in mind.
server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Outdated
Show resolved
Hide resolved
| Store store = indexShard.store(); | ||
| store.incRef(); | ||
| try { | ||
| store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata()); |
There was a problem hiding this comment.
This wraps some IOExceptions in throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);. Is that something we should be catching and handling?
There was a problem hiding this comment.
catch (IOException ex) {
if (existingFile.startsWith(IndexFileNames.SEGMENTS) || existingFile.startsWith(CORRUPTED_MARKER_NAME_PREFIX)) {
// TODO do we need to also fail this if we can't delete the pending commit file?
// if one of those files can't be deleted we better fail the cleanup otherwise we might leave an old commit
// point around?
throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);
}
logger.debug(() -> "failed to delete file [" + existingFile + "]", ex);
// ignore, we don't really care, will get deleted later on
}There was a problem hiding this comment.
My thinking was that this same method gets called a little later, where it does do some more complicated decision making around exceptions. We could replicate some of that logic here, which could save some transfer work if a recovery is doomed for file system reasons, but my first instinct was to avoid adding new failure modes to recovery, and that replicating or refactoring the handling logic wasn't worth it for this case, which I wouldn't expect to be common enough to need optimization.
There was a problem hiding this comment.
I had actually misread this comment - I now see your point was that IllegalStateExceptions were escaping here and could fail recovery before we had a chance to pull, which might fix the problem.
I think the errors here (delete failure) are not likely to be repaired by the peer, so we might as well bail out here and save some potentially expensive transfer for a likely doomed recovery attempt. The verification step also throws some IllegalStateExceptions, which I also think are an abnormal condition (cleanup is supposed to reconcile the directory with the provided snapshot) that needs investigation.
I've taken David's suggestion and treated the directory as having an empty snapshot (which will cause cleanup to remove everything), which I think makes cleanup more robust to partially transferred indices (at the cost of potentially retransmitting some files).
I could go either way on that, I don't really have a sense for how much extra work that could create in practice.
| final CountDownLatch latch = new CountDownLatch(1); | ||
| final AtomicBoolean truncate = new AtomicBoolean(true); | ||
| IndicesService unluckyIndices = internalCluster().getInstance(IndicesService.class, unluckyNode); | ||
| for (NodeStats dataNode : dataNodeStats) { |
There was a problem hiding this comment.
I'm pretty sure we can get ride of this and just add the send behavior from primary -> unlucky
There was a problem hiding this comment.
I'm not following this. I was using this to get the shard path on disk where I could scribble junk files and then check they were removed. It did seem a little indirect - is there a nicer way?
DaveCTurner
left a comment
There was a problem hiding this comment.
Trickier than I expected :) I left a few comments.
server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Outdated
Show resolved
Hide resolved
| store.incRef(); | ||
| try { | ||
| store.cleanupAndVerify("cleanup before peer recovery", indexShard.snapshotStoreMetadata()); | ||
| } catch (IOException e) { |
There was a problem hiding this comment.
I think I might prefer to restrict this catch just to exceptions from indexShard.snapshotStoreMetadata() rather than from cleanupAndVerify() too. If cleanupAndVerify() fails then that seems like it maybe should be fatal?
Also could we use org.elasticsearch.index.store.Store.MetadataSnapshot#EMPTY to represent the snapshot if we cannot get a proper MetadataSnapshot? Any file re-use during recovery is going to require a valid MetadataSnapshot, so failing here means that the directory contents are of no further use.
There was a problem hiding this comment.
Tim made a similar comment, to which I replied above. There is some logic in cleanFiles that inspects the exception generated by the block containing cleanupAndVerify and does a deeper clean in some cases. Looking at it again, I think it's not just an optimization. We'd probably need to replicate that to avoid having to worry about corrupt directories that we can never make forward progress on because we never get a chance to deep clean them.
Is it always true that the files are useless if we can't get a metadata snapshot? I had been thinking that for instance we might have transferred a bunch of data files previously but not the root segment file. In that case, after transferring the segment file we could reuse the data files already transferred?
There was a problem hiding this comment.
Is it always true that the files are useless if we can't get a metadata snapshot?
Yes I think so, we treat a failure as if the store is empty, see how we compute metadataSnapshot in org.elasticsearch.indices.recovery.PeerRecoveryTargetService#getStartRecoveryRequest and then use this data (alone) to compute the collection of files to recover in the implementations of org.elasticsearch.indices.recovery.plan.RecoveryPlannerService#computeRecoveryPlan.
I mean you're right, in principle we could work harder to determine which exact files we do/don't have and avoid a little extra data transfer sometimes. But we don't: this is already on a failure path, and not a particularly well-trodden one, so there's no need for heroic optimizations.
There was a problem hiding this comment.
I thought a bit more about this and am inclined to use Store.MetadataSnapshot#EMPTY when snapshotStoreMetadata() throws. Interestingly, when I tried it, I found that store.cleanupAndVerify appears to throw on an empty directory, here:
...
Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(ElasticsearchMockDirectoryWrapper(NIOFSDirectory@/Users/brendan/dev/elasticsearch/server/build/testrun/internalClusterTest/temp/org.elasticsearch.recovery.TruncatedRecoveryIT_31340047D7FA1434-001/tempDir-002/node_t4/indices/QuuB7D0wSki7rj5uryRXWQ/4/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@b7a4d9))): files: [write.lock]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:757)
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:535)
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:519)
at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:116)
at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:224)
at org.elasticsearch.index.store.Store$MetadataSnapshot.loadFromIndexCommit(Store.java:822)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:298)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:263)
at org.elasticsearch.index.store.Store.cleanupAndVerify(Store.java:699)
...
Given the name of the variable being assigned here is metadataOrEmpty it seems like it might be a bug, and we just haven't tested against an empty snapshot. If you agree, I'll attempt to fix that and go with this approach of falling back to Empty.
There was a problem hiding this comment.
#116059 adds support for empty stores to Store#cleanupAndVerify
There was a problem hiding this comment.
after making cleanupAndVerify handle an empty snapshot, I've taken your suggestion and treated the directory as having an empty snapshot if we fail to load one.
server/src/internalClusterTest/java/org/elasticsearch/recovery/TruncatedRecoveryIT.java
Outdated
Show resolved
Hide resolved
Remove extra guard that we are in phase 2 before checking for garbage. Take a reference on recoveryTarget around invoking cleanTempFiles.
…-clean-before-transfer
…-clean-before-transfer
…-clean-before-transfer
|
@elasticmachine update branch |
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
💚 Backport successful
|
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes elastic#104473
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes #104473
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes elastic#104473
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We attempt to load the latest snapshot of the target directory, which we supply to store's `cleanupAndVerify` method to remove any files not referenced by it. We treat a failure to load the latest snapshot as equivalent to an empty snapshot, which will cause `cleanupAndVerify` to purge the entire target directory and pull from scratch. Closes elastic#104473
)" This reverts commit fc67f7c.
)" (elastic#139569) * Revert "Attempt to clean up index before remote transfer (elastic#115142)" This reverts commit fc67f7c.
)" (elastic#139569) * Revert "Attempt to clean up index before remote transfer (elastic#115142)" This reverts commit fc67f7c.
)" (elastic#139569) * Revert "Attempt to clean up index before remote transfer (elastic#115142)" This reverts commit fc67f7c.
)" (elastic#139569) * Revert "Attempt to clean up index before remote transfer (elastic#115142)" This reverts commit fc67f7c.
If a node crashes during recovery, it may leave temporary files behind that can consume disk space, which may be needed to complete recovery. So we attempt to clean up the index before transferring files from a recovery source. We first attempt to call the store's
cleanupAndVerifymethod, which removes anything not referenced by the latest local commit. If that fails, because the local index isn't in a state to produce a local commit, then we fall back to removing temporary files by name.Closes #104473