Handling cleanup for tasks which might be OOMKilled

Question

I have some Python code running in k8s which in some cases is being OOMKilled and not leaving me time to cleanup which is causing bad behavior.

I've tried multiple approaches but nothing seems quite right... I feel like I'm missing something.

I've tried creating a soft limit in the code to: resource.setrlimit(resource.RLIMIT_RSS, (-1, cgroup_mem_limit // 100 * 95) but sometimes my code still gets killed by the OOMKiller before I get a memory error. (When this happens it's completely reproducible)

What I've found that works is limiting by RLIMIT_AS instead of RLIMIT_RSS but this gets me killed much earlier as AS is much higher than RSS (sometimes >100MB higher) I'd like to avoid wasting so much memory. (100MB x hundreds of replicas adds up)

I've tried using a sidecar for the cleanup but (at least the way I managed to implement it) this means both containers need an API which together cost more than 100MB as well, so didn't really help.

Why am I surpassing my memory limit? My system often handles very large loads with lots of tasks which could be either small or large (and there's no way to know ahead of time, think uncompressing) so in order to take best advantage of our resources we try each task with a pod which has little memory (which allows for high replica count) and if the task fails we bump it up to a new pod with more memory.

Is there a way to be softly terminated before being OOMKilled while still looking at something which more closely corresponds to my real usage? Or is there something wrong with my design? Is there a better way to do this?

What kind of cleanup are you hoping to do? The process exiting (forcibly) will destroy all Python-level objects and close all network connections. The OOMKill mechanism is pretty low-level and fairly abrupt, and there's not a good way to react to it. — David Maze
– David Maze, Commented Oct 16 at 9:53
@DavidMaze in order to bump the task up to a pod with more memory. I do this by moving it to a different queue in rabbitmq. I was wondering if there is a way to avoid being OOMKilled (e.g. killing myself first) or maybe a different design could solve this? — Aharon Sambol
– Aharon Sambol, Commented Oct 16 at 16:33
Maybe a dead letter exchange could help? At the RabbitMQ level, this would let you handle a message differently if it fails. — David Maze
– David Maze, Commented Oct 16 at 18:18
@DavidMaze yeah that could work, the issue is that in order to get to the dead letter exchange I would need to be OOMKilled multiple times (it's sketchy to send a task there over only 1 requeue) and that's a bunch of wasted time doing the same thing over and over just to be OOMKilled each time — Aharon Sambol
– Aharon Sambol, Commented Oct 19 at 13:12

EasyPea · Accepted Answer · 2025-10-29 13:18:22Z

When a container is OOMKilled, it’s terminated immediately with a hard SIGKILL. That means the process doesn’t get a chance to run any cleanup logic, not even lifecycle hooks like preStop. If cleanup is important, it has to happen outside of the process that might get killed.

1. Understand the limitation
An OOM kill is abrupt. The container doesn’t get to react or run any shutdown routines. Only sibling containers in the same Pod or external controllers can observe and respond to the event.

2. Handle cleanup externally

Sidecar aproach: Run a lightweight helper container in the same pod with a separate memory limit. It can monitor the main container’s status and perform cleanup if it dies with an OOMKilled reason.

Controller or operator: Use a higher-level controller that watches Pod status and triggers cleanup Jobs or routines when it sees OOMKilled events.

Startup recovery: If the container restarts, make it detect annd repair partial state on startup rather than relying on in-process cleanup at shutdown.

3. Make your workloads resilient

Use ephemeral storage like emptyDir so Kubernetes automatically deletes leftovers when the Pod goes away.

Design your work to be idempotent, so rerunning it is safe even after a crash.

Use leases, heartbeats, or TTLs to detect abandoned work and reclaim it.

For external systems, give each task a unique namespace or prefix so cleanup jobs can find and remove orphaned resources later.

4. Reduce the likelihood of OOM kills
Set realistic resourc requests and limits, monitor actual memory use, and tune your workloads to avoid runaway memory growth. Smaller work chunks and backpressure mechanisms can also help.

Collectives™ on Stack Overflow

Handling cleanup for tasks which might be OOMKilled

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related