Skip to content

Fixed max global series per user/metric limit when shuffle sharding is enabled #3369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

What this PR does:
The max global series per user/metric limit doesn't work correctly when shuffle sharding and shard-by-all-labels are both enabled because it considers series to be distributed across all ingesters, while actually series are only distributed across a subset of ingesters (the shard size).

Which issue(s) this PR fixes:
Fixes #3367

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
@pracucci
Copy link
Contributor Author

pracucci commented Oct 20, 2020

I found a general issue with integration tests (which I believe is the reason why they're failing) I'm going to address in a separate PR and, once merged, I will rebase master in this PR. Done

@pracucci pracucci force-pushed the fix-global-limits-when-shuffle-sharding-is-enabled branch from de8185a to 604ba5b Compare October 20, 2020 14:34

- `max_series_per_query` / `-ingester.max-series-per-query`
- `max_samples_per_query` / `-ingester.max-samples-per-query`

Limits on the number of timeseries and samples returns by a single ingester during a query.

- `max_metadata_per_user` / `-ingester.max-metadata-per-user`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata limits were not documented, do I've added it.

}

func (f *flushTransferer) Flush() {}
func (f *flushTransferer) TransferOut(ctx context.Context) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing anything, the logic of this flushTransferer was not used, so I've removed it in favour of the tiny nopFlushTransferer

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fast turnaround.

if numIngesters > 0 {
return int((float64(globalLimit) / float64(numIngesters)) * float64(l.replicationFactor))
if numIngesters == 0 {
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now local limit is 0 if num healthy ingesters is zero, right? I think that's probably okay, but the comment above is confusing now, since that's not what "ignore the global limit" makes me think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means unlimited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me also outline the logic hasn't changed. If the ring client doesn't see any ingester (0), the global rate limit was temporarily disabled even before.

@@ -17,65 +18,124 @@ func TestSeriesLimit_maxSeriesPerMetric(t *testing.T) {
maxLocalSeriesPerMetric int
maxGlobalSeriesPerMetric int
ringReplicationFactor int
ringZoneAwarenessEnabled bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to refactor limiter code to avoid the same repeated tests? (In a follow-up PR, if at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think could be possible. Added to my TODO list for a separate PR.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job!

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, and also for updating the docs for max_metadata_per_user and max_metadata_per_metric.

…s enabled

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the fix-global-limits-when-shuffle-sharding-is-enabled branch from 43c3a84 to 5df9485 Compare October 21, 2020 16:41
@pracucci pracucci merged commit 85942c5 into cortexproject:master Oct 22, 2020
@pracucci pracucci deleted the fix-global-limits-when-shuffle-sharding-is-enabled branch October 22, 2020 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 participants