Idea: Compute HA failover timeout from tenant's scrape rate

@pracucci

What is the problem you are trying to solve?

The default HA tracker failover timeout of 30 s can be inadequate for some tenants, leading to continuous failovers.

On the other hand, increasing the timeout would lead to longer periods in which we stop scraping metrics for a tenant in the event of actual unavailability from upstream instances, thus working against the whole purpose of HA tracker.

Which solution do you envision (roughly)?

@pracucci suggests computing the timeout for each tenant from its scrape interval:

Would be very nice if we wouldn't have to change it manually in the config, but we could auto-detect the tenant scrape interval (or highest value if customer has multiple scraping jobs with different intervals), so that we can apply the failover timeout equal to max(configured timeout, highest scrape interval * delta).

Have you considered any alternatives?

Make ha_tracker_failover_timeout configurable per-tenant #4066

This has been implemented, but specifying a custom timeout per tenant in config isn't a great approach.

Any additional context to share?

No response

How long do you think this would take to be developed?

Small (<= 1 month dev)

What are the documentation dependencies?

No response

Proposer?

@pracucci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Idea: Compute HA failover timeout from tenant's scrape rate #12108

What is the problem you are trying to solve?

Which solution do you envision (roughly)?

Have you considered any alternatives?

Any additional context to share?

How long do you think this would take to be developed?

What are the documentation dependencies?

Proposer?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idea: Compute HA failover timeout from tenant's scrape rate #12108

Description

What is the problem you are trying to solve?

Which solution do you envision (roughly)?

Have you considered any alternatives?

Any additional context to share?

How long do you think this would take to be developed?

What are the documentation dependencies?

Proposer?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions