-
Notifications
You must be signed in to change notification settings - Fork 823
Add a counter to track chunks which were de-duplicated at store time. #2485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a counter to track chunks which were de-duplicated at store time. #2485
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you mention it as ENHANCEMENT
in the CHANGELOG please?
@@ -61,6 +61,11 @@ var ( | |||
// For 100k series for 7 week, could be 1.2m - 10*(8^(7-1)) = 2.6m. | |||
Buckets: prometheus.ExponentialBuckets(10, 8, 7), | |||
}) | |||
dedupedChunksTotal = promauto.NewCounter(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This correlate with cortex_chunk_store_stored_chunks_total
which is per user. @gouthamve are we OK having this metric non per-user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of a case where I would want to query this metric with a user dimension, that is, I'm not sure it's ever meaningful to know if one user has more de-duped chunks than another.
However if you were already querying cortex_chunk_store_stored_chunks_total
with a user dimension and wanted to compare accurately with this metric then it would also need a user dimension.
My general feeling is there isn't a use case for checking de-duplication with a user and you would query both cortex_chunk_store_stored_chunks_total
and this metric without a user label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need it per-user if you're trying to charge users based on how much of the store they used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a count metric useful for this? Is there a good way to convert number of chunks flushed to the size of those chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all a little fuzzy since we don't include metadata in cortex_ingester_chunk_stored_bytes_total
and don't have any user-specific counters for labels.
I'll modify #2353 to mention the bytes metric too.
Is this different to the dedupe cache hit rate? |
If I followed all this correctly, the cache used for checking de-dupes is the chunks cache which is also used when The index |
OK, is this different to the dedupe cache hit rate exposed by ingesters? |
Interesting I hadn't though about the runtime separation. The problem here would be anyone running a singlebinary, which is possible to do with caching and distributed stores like dynamo/s3. I know this probably wouldn't be advised but we see this a lot with Loki Helm users. Also a recent change in Loki will allow ingesters to query the store directly (part of how the persisting BoltDB to the object store implementation works), which would result in ingesters calling FetchChunks and impacting using cache stats for this purpose. For these reasons I still see value in a separate metric for this. |
Signed-off-by: Ed Welch <edward.welch@grafana.com>
Signed-off-by: Ed Welch <edward.welch@grafana.com>
81b0081
to
32a2bfb
Compare
Signed-off-by: Ed Welch <edward.welch@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good addition, the post-deduped metrics required in #2353 are already implemented imo.
This is to check if the deduping is effective or not.
Signed-off-by: Ed Welch edward.welch@grafana.com
What this PR does:
At the moment it's a bit tricky to understand how effectively chunks are being de-duplicated at the store with the cache based de-dupe code.
This PR adds a counter which explicitly increments when a chunk was not stored because it was already in the cache.
This can be combined with
cortex_ingester_flush_reasons
to get a ratio of chunks flushed vs deduped to better tell how well the de-duplication is working.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]