Skip to content

Publish() should backoff if Elasticsearch returns 429 HTTP rate limiting responses #36926

@rodrigc

Description

@rodrigc

Describe the enhancement:

In some cases of high load, Elasticsearch will return 429 errors to indicate rate limiting.
Beats should back off if it detects a HTTP 429 response.

Looking at the code in here, it looks like it does not do that and just re-sends per code

// L500
if itemStatus == http.StatusTooManyRequests {
	stats.fails++
	stats.tooMany++
	return true
}

// L553
func (stats bulkResultStats) reportToObserver(ob outputs.Observer) {
	ob.AckedEvents(stats.acked)
	ob.RetryableErrors(stats.fails) // 👈 retries no back off
	ob.PermanentErrors(stats.nonIndexable)
	ob.DuplicateEvents(stats.duplicates)
	ob.DeadLetterEvents(stats.deadLetter)

	ob.ErrTooMany(stats.tooMany)
}

Describe a specific use case for the enhancement or feature:

I have an Elasticsearch cluster, and in my network I have deployed 1500 elastic-agents.
They are sending lots of logs to Elasticsearch, and I am routinely getting HTTP 429 errors.

On one hand, I am trying to scale the resources on the Elasticsearch server side.

However, it would be good if beats and elastic-agent could backoff if it detects HTTP 429 errors. Right now beats and elastic-agent seem to keep hammering on Elasticsearch if it returns HTTP 429 errors.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions