Skip to content

[release-2.8.x] caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval #9783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: release-2.8.x
Choose a base branch
from

Conversation

grafanabot
Copy link
Collaborator

Backport 3e1f2fc from #9757


What this PR does / why we need it:
Currently, when we find a relevant cached negative response for a logs query, we do the following:

  • If the cached query completely covers the new query:
    • return back an empty response.
  • else:
    • fill the gaps on either/both sides of the cached query.

The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the response extractor, unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a backwards query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query limit in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user.

This PR fixes the issue by doing the following changes when handling cache hit:

  • If the cached query completely covers the new query:
    • return back an empty response[existing].
  • else if the cached query does not overlap with the new query:
    • do the new query as requested.
    • If the new query results in an empty response and has a higher interval than the cached query:
      • update the cache
  • else:
    • query the data for missing intervals on both/either side[existing]
    • update the cache with extended intervals if the new queries resulted in an empty response[existing]

Special notes for your reviewer:
We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing backwards query and the cachedQuery.End < newQuery.Start, it should be okay to extend the query and do cachedQuery.End to newQuery.End to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching.

Checklist

  • Tests updated
  • CHANGELOG.md updated
… query interval does not overlap the cached query interval (#9757)

**What this PR does / why we need it**:
Currently, when we find a relevant cached negative response for a logs
query, we do the following:
* If the cached query completely covers the new query:
  * return back an empty response.
* else:
  * fill the gaps on either/both sides of the cached query.

The problem with filling the gaps is that when the cached query does not
overlap at all with the new query, we have to extend the query beyond
what the query requests for. However, with the logs query, we have a
limit on the number of lines we can send back in the response. So, this
could result in the query response having logs which were not requested
by the query, which then get filtered out by the [response
extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299),
unexpectedly resulting in an empty response. For example, if the query
was cached for start=15, end=20 and we get a `backwards` query for
start=5, end=10. To fill the gap, the query would be executed for
start=5, end=15. Now, if we have logs more than the query `limit` in the
range 10-15, we would filter out all the data in the response extractor
and send back an empty response to the user.

This PR fixes the issue by doing the following changes when handling
cache hit:
* If the cached query completely covers the new query:
  * return back an empty response[_existing_].
* else if the cached query does not overlap with the new query:
  * do the new query as requested.
* If the new query results in an empty response and has a higher
interval than the cached query:
    * update the cache
* else:
  * query the data for missing intervals on both/either side[_existing_]
* update the cache with extended intervals if the new queries resulted
in an empty response[_existing_]

**Special notes for your reviewer**:
We could do further improvements in the handling of queries not
overlapping with cached query by selectively extending the queries based
on query direction and cached query lying before/after the new query.
For example, if the new query is doing `backwards` query and the
`cachedQuery.End` < `newQuery.Start`, it should be okay to extend the
query and do `cachedQuery.End` to `newQuery.End` to fill the cache since
query would first fill the most relevant data before hitting the limits.
I did not want to complicate the fix so went without implementing this
approach. We can revisit later if we feel we need to improve our
caching.

**Checklist**
- [x] Tests updated
- [x] `CHANGELOG.md` updated

---------

Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
(cherry picked from commit 3e1f2fc)
@grafanabot grafanabot requested a review from a team as a code owner June 23, 2023 19:57
@grafanabot grafanabot added backport size/L type/bug Somehing is not working as expected labels Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport size/L type/bug Somehing is not working as expected
3 participants