Skip to content

Bug: Ingesters Failing to Evenly Spread Following Zone Restart #13293

@KrishnaJyothika

Description

@KrishnaJyothika

What is the bug?

Following zone-wise restarts, ingesters are failing to rebalance effectively both across zones and within individual zones. This leads to uneven load distribution, where some pods experience significant overload while others remain underutilized. Though spread_minimization is enabled

spread_minimizing_zones: zone-a,zone-b,zone-c
token_generation_strategy: spread-minimizing

Ingesters memory utilization - without proper spreading
Image

How to reproduce it?

  1. Deploy Mimir 2.14.2v
  2. Push 60-65M load
  3. Wait until ingesters holds 13hrs of data
  4. Perform restarts on ingesters with one zone at a time

What did you think would happen?

During zone-wise restarts, some ingesters flush their data and come back online faster than others. These early-starting ingesters begin receiving the bulk of incoming data, leading to overutilization, while the ingesters that restart later remain underutilized.

  1. With spread-minimizing enabled, all ingesters should ideally restart simultaneously to prevent skewed load distribution.
  2. Even if restart delays occur, the system should rebalance and evenly spread the load across all ingesters once they are back online.

What was your environment?

Kubernetes
Helms
Mimir 2.14.2v

Any additional context to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions