Batch series in streaming ingester based on message sizes. #3015

pstibrany · 2020-08-11T15:48:15Z

What this PR does: This PR modifies batching in blocks-based ingester, when it is streaming results back. Instead of using only number of series for batching, it also considers how much data is buffered in memory, and flushes batch if it get too big. This is to prevent problems with too large gRPC messages. "Too big" is hardcoded to 1 MiB, to comfortably fit into grpc default, and also to avoid using too much memory.

Note that if single series returns too much data, it can still reach gRPC limit for sending messages. Solution to that would be splitting single timeseries into multiple messages, but querier doesn't support that at the moment.

Which issue(s) this PR fixes:
Fixes #2945

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci

Nice improvement, LGTM! Should we do it for the chunks storage too, to keep it specular? :)

pkg/ingester/ingester_v2.go

pstibrany · 2020-08-13T15:42:45Z

Here is a result of benchmark before and after this change.

name                      old time/op    new time/op    deltanchstat before.txt after.txt 
Ingester_v2QueryStream-4    6.31ms ± 2%    6.46ms ± 1%  +2.38%  (p=0.001 n=10+8)

name                      old alloc/op   new alloc/op   delta
Ingester_v2QueryStream-4    3.45MB ± 0%    3.45MB ± 0%  +0.00%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
Ingester_v2QueryStream-4     4.45k ± 0%     4.45k ± 0%  +0.02%  (p=0.000 n=10+10)

pracucci · 2020-08-13T15:59:00Z

Thanks @pstibrany! LGTM

Should we do it for the chunks storage too, to keep it specular? :)

Could you open an issue for that, please?

pstibrany · 2020-08-13T16:03:45Z

Could you open an issue for that, please?

I will take a look first if we can make it part of this PR (if it's small change).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

pstibrany requested a review from pracucci August 11, 2020 15:48

pull-request-size bot added the size/L label Aug 11, 2020

pracucci approved these changes Aug 12, 2020

View reviewed changes

pkg/ingester/ingester_v2.go Show resolved Hide resolved

pstibrany mentioned this pull request Aug 14, 2020

Implement batching series in streaming chunks ingester based on message size. #3033

Closed

pstibrany added 7 commits August 14, 2020 11:55

Batch series in streaming ingester based on message sizes.

28feeda

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

CHANGELOG.md

66c79cd

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Renamed var.

5043102

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Fix whitespace.

15f3d8b

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Added benchmark for v2QueryStream

ae8bd70

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Don't use grpc server, but call method directly.

6c832f6

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Push CI

2f3b89b

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

pstibrany merged commit e8a6686 into cortexproject:master Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch series in streaming ingester based on message sizes. #3015

Batch series in streaming ingester based on message sizes. #3015

Uh oh!

pstibrany commented Aug 11, 2020

pracucci left a comment

Uh oh!

pstibrany commented Aug 13, 2020

pracucci commented Aug 13, 2020

pstibrany commented Aug 13, 2020

Batch series in streaming ingester based on message sizes. #3015

Batch series in streaming ingester based on message sizes. #3015

Uh oh!

Conversation

pstibrany commented Aug 11, 2020

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

pstibrany commented Aug 13, 2020

pracucci commented Aug 13, 2020

pstibrany commented Aug 13, 2020