-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Description
When issuing blob deletions to S3, there is no limit to the number of errors that we accumulate while deletions are ongoing. The combination of large snapshot clean ups which would issue many such calls, and all deletions and their retries failing e.g. due to some setup issue, could lead to a huge amount of memory usage and potentially OOMing the node. It is not clear how severe the same issue can be in other object stores, since it might depend a bit on the error responses that the SDK returns and how large it is, but theoretically this could still happen there too. We should limit the suppressed errors we keep while issuing deletes as I do not think there is any value in recording all of these in the exception (I'm not even sure they are preserved).