Add a counter to track chunks which were de-duplicated at store time. #2485

slim-bean · 2020-04-19T18:22:28Z

Signed-off-by: Ed Welch edward.welch@grafana.com

What this PR does:

At the moment it's a bit tricky to understand how effectively chunks are being de-duplicated at the store with the cache based de-dupe code.

This PR adds a counter which explicitly increments when a chunk was not stored because it was already in the cache.

This can be combined with cortex_ingester_flush_reasons to get a ratio of chunks flushed vs deduped to better tell how well the de-duplication is working.

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci · 2020-04-20T10:57:42Z

I think we could assume this fixes #2353. /cc @bboreham

pracucci

LGTM. Could you mention it as ENHANCEMENT in the CHANGELOG please?

pracucci · 2020-04-20T10:59:37Z

pkg/chunk/series_store.go

@@ -61,6 +61,11 @@ var (
 		// For 100k series for 7 week, could be 1.2m - 10*(8^(7-1)) = 2.6m.
 		Buckets: prometheus.ExponentialBuckets(10, 8, 7),
 	})
+	dedupedChunksTotal = promauto.NewCounter(prometheus.CounterOpts{


This correlate with cortex_chunk_store_stored_chunks_total which is per user. @gouthamve are we OK having this metric non per-user?

I don't know of a case where I would want to query this metric with a user dimension, that is, I'm not sure it's ever meaningful to know if one user has more de-duped chunks than another.

However if you were already querying cortex_chunk_store_stored_chunks_total with a user dimension and wanted to compare accurately with this metric then it would also need a user dimension.

My general feeling is there isn't a use case for checking de-duplication with a user and you would query both cortex_chunk_store_stored_chunks_total and this metric without a user label.

You need it per-user if you're trying to charge users based on how much of the store they used.

Is a count metric useful for this? Is there a good way to convert number of chunks flushed to the size of those chunks?

It's all a little fuzzy since we don't include metadata in cortex_ingester_chunk_stored_bytes_total and don't have any user-specific counters for labels.

I'll modify #2353 to mention the bytes metric too.

bboreham · 2020-04-20T12:53:17Z

Is this different to the dedupe cache hit rate?

slim-bean · 2020-04-20T13:37:37Z

Is this different to the dedupe cache hit rate?

If I followed all this correctly, the cache used for checking de-dupes is the chunks cache which is also used when FetchChunks is called, so the hit rate on this cache is affected by queries (if you have a lot of queries for really old data they would drive up the miss rate on this cache)

The index writeDedupeCache is only checked after the chunks cache is checked, so if the chunks cache returned a result, the writeDedupeCache is never checked and therefore the stats for it won't reflect successful de-dupes

bboreham · 2020-04-20T13:44:39Z

OK, is this different to the dedupe cache hit rate exposed by ingesters?
(ingesters never call FetchChunks)

slim-bean · 2020-04-20T14:18:08Z

OK, is this different to the dedupe cache hit rate exposed by ingesters?
(ingesters never call FetchChunks)

Interesting I hadn't though about the runtime separation.

The problem here would be anyone running a singlebinary, which is possible to do with caching and distributed stores like dynamo/s3. I know this probably wouldn't be advised but we see this a lot with Loki Helm users.

Also a recent change in Loki will allow ingesters to query the store directly (part of how the persisting BoltDB to the object store implementation works), which would result in ingesters calling FetchChunks and impacting using cache stats for this purpose.

For these reasons I still see value in a separate metric for this.

Signed-off-by: Ed Welch <edward.welch@grafana.com>

gouthamve

I think this is a good addition, the post-deduped metrics required in #2353 are already implemented imo.

This is to check if the deduping is effective or not.

pull-request-size bot added the size/XS label Apr 19, 2020

pracucci approved these changes Apr 20, 2020

View reviewed changes

slim-bean added 2 commits April 21, 2020 07:56

Add a counter to track chunks which were de-duplicated at store time.

c653149

Signed-off-by: Ed Welch <edward.welch@grafana.com>

update changelog

32a2bfb

Signed-off-by: Ed Welch <edward.welch@grafana.com>

slim-bean force-pushed the dedupe-chunk-counter branch from 81b0081 to 32a2bfb Compare April 21, 2020 11:57

fix white noise lint

e946621

Signed-off-by: Ed Welch <edward.welch@grafana.com>

gouthamve reviewed Apr 22, 2020

View reviewed changes

gouthamve approved these changes Apr 22, 2020

View reviewed changes

pracucci merged commit 681e16d into cortexproject:master Apr 22, 2020

jtlisi mentioned this pull request Mar 2, 2021

Stream chunks from blocks-ingester to querier #3889

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a counter to track chunks which were de-duplicated at store time. #2485

Add a counter to track chunks which were de-duplicated at store time. #2485

Uh oh!

slim-bean commented Apr 19, 2020 •

edited

Loading

pracucci commented Apr 20, 2020 •

edited

Loading

pracucci left a comment

pracucci Apr 20, 2020

slim-bean Apr 20, 2020

bboreham Apr 20, 2020

slim-bean Apr 20, 2020

bboreham Apr 20, 2020

bboreham commented Apr 20, 2020

slim-bean commented Apr 20, 2020

bboreham commented Apr 20, 2020

slim-bean commented Apr 20, 2020

gouthamve left a comment

Add a counter to track chunks which were de-duplicated at store time. #2485

Add a counter to track chunks which were de-duplicated at store time. #2485

Uh oh!

Conversation

slim-bean commented Apr 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pracucci commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

pracucci Apr 20, 2020

Choose a reason for hiding this comment

slim-bean Apr 20, 2020

Choose a reason for hiding this comment

bboreham Apr 20, 2020

Choose a reason for hiding this comment

slim-bean Apr 20, 2020

Choose a reason for hiding this comment

bboreham Apr 20, 2020

Choose a reason for hiding this comment

bboreham commented Apr 20, 2020

slim-bean commented Apr 20, 2020

bboreham commented Apr 20, 2020

slim-bean commented Apr 20, 2020

gouthamve left a comment

Choose a reason for hiding this comment

slim-bean commented Apr 19, 2020 •

edited

Loading

pracucci commented Apr 20, 2020 •

edited

Loading