-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Description
Seen in production on a node with 1GiB of heap: a deletion of a collection of snapshots involving one index with several megabytes of index metadata caused the node to go OOM. The issue in this case was ultimately the way that we invoke BlobStoreRepository.SnapshotsDeletion.IndexSnapshotsDeletion#determineShardCount concurrently across all 10 snapshot threads at once. Each thread ended up needing ~50MiB of heap to parse the metadata for this index, and the node couldn't cope. On smaller nodes I guess we shouldn't be doing this work with such high concurrency.
There's also a code comment indicating that we could make this metadata-loading process way more efficient:
elasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Lines 1283 to 1286 in f15ef7c
| // NB since 7.9.0 we deduplicate index metadata blobs, and one of the components of the deduplication key is the | |
| // index UUID; the shard count is going to be the same for all metadata with the same index UUID, so it is | |
| // unnecessary to read multiple metadata blobs corresponding to the same index UUID. | |
| // TODO Skip this unnecessary work? Maybe track the shard count in RepositoryData? |
Moreover as noted in #116379 there's a 2GiB limit for the list of blobs to clean up, but a small node would go OOME long before hitting that limit. We should impose a stricter limit on the memory usage for this data structure, ideally spilling the list to storage when reaching the limit but even just forgetting about some blobs would be better than having the node fail.
Relates #108278