Track & log when there is insufficient disk space available to execute merges#131711
Conversation
enqueued merge tasks
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
|
Hi @albertzaharovits, I've created a changelog YAML for you. |
| assert super.lock.isHeldByCurrentThread(); | ||
| Tuple<MergeTask, Long> head = enqueuedByBudget.peek(); | ||
| if (head != null && head.v2() > availableBudget) { | ||
| LOGGER.warn("There are merge tasks enqueued but there's insufficient disk space available to execute them"); |
There was a problem hiding this comment.
These log messages will be printed every indices.merge.disk.check_interval (5 sec). That sounds Ok to you, right?
There was a problem hiding this comment.
Would it be useful to include the availableBudget and head.v2() values in the log message too?
There was a problem hiding this comment.
Agree, thanks for suggesting! Pushed: d334433
DaveCTurner
left a comment
There was a problem hiding this comment.
Seems reasonable to me but I'm probably not best-placed to review this properly. Could you try someone from the distrib/indexing team instead?
|
Thanks for looking, David! |
| assert super.lock.isHeldByCurrentThread(); | ||
| Tuple<MergeTask, Long> head = enqueuedByBudget.peek(); | ||
| if (head != null && head.v2() > availableBudget) { | ||
| LOGGER.warn("There are merge tasks enqueued but there's insufficient disk space available to execute them"); |
There was a problem hiding this comment.
Would it be useful to include the availableBudget and head.v2() values in the log message too?
| Locale.ROOT, | ||
| "There are no merge tasks currently running, " | ||
| + "but there are [%d] enqueued ones that are blocked because of insufficient disk space", | ||
| enqueuedByBudget.size() |
There was a problem hiding this comment.
Similarly, would it be useful to have the availableBudget and head.v2() values in the log message here?
| messageBuilder.append( | ||
| "], and there are [" | ||
| + enqueuedByBudget.size() | ||
| + "] additional enqueued ones that are blocked because of insufficient disk space" |
| } | ||
| } | ||
|
|
||
| void postBudgetUpdate() {}; |
There was a problem hiding this comment.
I would assert super.lock.isHeldByCurrentThread(); here too
…e merges (elastic#131711) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates elastic#88606 (comment)
…e merges (elastic#131711) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates elastic#88606 (comment)
…e merges (#131711) (#131772) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)
…e merges (#131711) (#131773) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)
…e merges (#131711) (#131774) This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps). This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks `indices.merge.disk.watermark.high`, and possibly consider implementing aborting already executing merges too. Relates #88606 (comment)

This goal of this PR is to record information about the available disk space at the time that merge tasks are scheduled (the info will show up in heap dumps).
This should aid troubleshooting in cases where there are running merge tasks all while the available disk space on the node is below the watermark. In this case we should increase the threshold for scheduling merge tasks
indices.merge.disk.watermark.high, and possibly consider implementing aborting already executing merges too.Relates #88606 (comment)