Skip to content

Batch series in streaming ingester based on message sizes. #3015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 14, 2020
Merged

Batch series in streaming ingester based on message sizes. #3015

merged 7 commits into from
Aug 14, 2020

Conversation

pstibrany
Copy link
Contributor

What this PR does: This PR modifies batching in blocks-based ingester, when it is streaming results back. Instead of using only number of series for batching, it also considers how much data is buffered in memory, and flushes batch if it get too big. This is to prevent problems with too large gRPC messages. "Too big" is hardcoded to 1 MiB, to comfortably fit into grpc default, and also to avoid using too much memory.

Note that if single series returns too much data, it can still reach gRPC limit for sending messages. Solution to that would be splitting single timeseries into multiple messages, but querier doesn't support that at the moment.

Which issue(s) this PR fixes:
Fixes #2945

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement, LGTM! Should we do it for the chunks storage too, to keep it specular? :)

@pstibrany
Copy link
Contributor Author

Here is a result of benchmark before and after this change.

name                      old time/op    new time/op    deltanchstat before.txt after.txt 
Ingester_v2QueryStream-4    6.31ms ± 2%    6.46ms ± 1%  +2.38%  (p=0.001 n=10+8)

name                      old alloc/op   new alloc/op   delta
Ingester_v2QueryStream-4    3.45MB ± 0%    3.45MB ± 0%  +0.00%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
Ingester_v2QueryStream-4     4.45k ± 0%     4.45k ± 0%  +0.02%  (p=0.000 n=10+10)
@pracucci
Copy link
Contributor

Thanks @pstibrany! LGTM

Should we do it for the chunks storage too, to keep it specular? :)

Could you open an issue for that, please?

@pstibrany
Copy link
Contributor Author

Could you open an issue for that, please?

I will take a look first if we can make it part of this PR (if it's small change).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany pstibrany merged commit e8a6686 into cortexproject:master Aug 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 participants