Skip to content

allocation: add duration and count metrics for write load hotspot#138465

Merged
schase-es merged 14 commits intoelastic:mainfrom
schase-es:ES-13381_write-load-monitor-hotspot-metrics
Dec 5, 2025
Merged

allocation: add duration and count metrics for write load hotspot#138465
schase-es merged 14 commits intoelastic:mainfrom
schase-es:ES-13381_write-load-monitor-hotspot-metrics

Conversation

@schase-es
Copy link
Contributor

The WriteLoadConstraintMonitor now reports as APM metrics its current count of hotspot nodes, and the duration of each hotspot. When a node is reported as hotspotting, this start time is stored in a table along with its node id. When it stops, the time difference between then and now is recorded as a duration metric. The size of this table is the current hotspot count.

Closes: ES-13381

The WriteLoadConstraintMonitor now reports as APM metrics its current count of
hotspot nodes, and the duration of each hotspot. When a node is reported as
hotspotting, this start time is stored along with its node id. When it stops,
the time difference between then and now is recorded as a duration metric.

Closes: ES-13381
@schase-es schase-es requested a review from a team as a code owner November 24, 2025 01:29
@schase-es schase-es added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.3.0 labels Nov 24, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Nov 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @schase-es, I've created a changelog YAML for you.


private List<LongWithAttributes> getHotspotNodesCount() {
return List.of(new LongWithAttributes(hotspotNodesCount));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to return an empty list here if we're not the master, otherwise we'll get a bunch of bogus values each metric interval from the non-master nodes.

Perhaps getHotSpotNodesCount could switch it to -1 and then we could only report if it's been switched back to an actual value (the InternalClusterInfoService stops refreshing the cluster info when the node is no longer master so this approach would naturally only report metrics from the master).

Or any other approach you think is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make this a cluster state listener, we will stop receiving new ClusterInfo when we are no longer master.

If we make this listen to the cluster state, it adds more dependencies and will add a tiny additional cost to the cluster state applier thread (negligible but we should try and do that as little as possible).

I think I prefer my suggestion to only return a hot-spot node count if we've calculated one since last time we were polled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also be presumably simpler to test (just confirm we only return the count if onNewInfo has been called since last time we asked)

);

final Set<String> firstWaveHotspotNodes = testState.hotspotNodeIds();
final String removeHotspotId = randomSubsetOf(1, testState.hotspotNodeIds()).get(0);
Copy link
Contributor

@nicktindall nicktindall Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you can use randomFrom(Collection<T>)


// check the test state hotspots are recorded in the counter
recordingMeterRegistry.getRecorder().collect();
assertMetricsCollected(recordingMeterRegistry, List.of((long) testState.hotspotNodeIds().size()), List.of());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda think we shouldn't add the assertions around metrics to every test, just to keep those tests focused on the specific aspects that they're testing? I'd be happy to keep it all for the dedicated test that you have at the bottom.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good with a few comments

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits

hotspotDurationHistogram.record(hotspotDuration / 1000.0);
}
hotspotNodesCount = hotspotNodeStartTimes.size();
hotspotNodesCountUpdatedSinceLastRead = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd be feasible to break this stuff out into a method, just so it doesn't distract too much from the monitoring logic?

Likewise with the recording of the start times?

if (state.term() != hotspotNodeStartTimesLastTerm) {
hotspotNodeStartTimesLastTerm = state.term();
hotspotNodeStartTimes.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can also clear the state if e.g. state. isLocalNodeElectedMaster() == false, this can happen before the term is bumped.


public WriteLoadConstraintMonitor(
public static final String HOTSPOT_NODES_COUNT_METRIC_NAME = "es.allocator.allocations.node.write_load_hotspot.current";
public static final String HOTSPOT_DURATION_METRIC_NAME = "es.allocator.allocations.node.write_load_hotspot.duration.histogram";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we usually put constants above the fields

public static final String HOTSPOT_DURATION_METRIC_NAME = "es.allocator.allocations.node.write_load_hotspot.duration.histogram";

private volatile long hotspotNodesCount = 0; // metrics source of hotspotting node count
private volatile boolean hotspotNodesCountUpdatedSinceLastRead = false; // turns off metrics when not master/onNewInfo isn't called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe achieve the same with a single atomic field instead e.g.

writer:

hotSpotNodesCount.set(count);

reader:

int count = hotSpotNodesCount.getAndSet(-1)
if (count >= 0) {
    return List.of(LongWithAttributes(count, ...);
} else {
    return List.of();
}

or is there an advantage to having the two fields?

@schase-es schase-es merged commit 20d144c into elastic:main Dec 5, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. v9.3.0

3 participants