Skip to content

Conversation

@fabriziomello
Copy link
Member

@fabriziomello fabriziomello commented Aug 28, 2025

Cap the refresh window at the hypertable invalidation threshold to don't produce unnecessary batches for refresh.

Disable-check: force-changelog-file

@github-actions
Copy link

@melihmutlu, @mkindahl: please review this pull request.

Powered by pull-review

@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch from 061587a to 445141b Compare August 28, 2025 23:28
@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch 2 times, most recently from 55dd0f2 to 09e04a2 Compare August 29, 2025 13:52
@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.49%. Comparing base (78efcf1) to head (7d78190).
⚠️ Report is 139 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8560      +/-   ##
==========================================
+ Coverage   82.41%   82.49%   +0.08%     
==========================================
  Files         248      248              
  Lines       47676    47650      -26     
  Branches    12117    12113       -4     
==========================================
+ Hits        39290    39309      +19     
- Misses       3497     3508      +11     
+ Partials     4889     4833      -56     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch 6 times, most recently from 99f5313 to 039cfb7 Compare August 29, 2025 19:36
Copy link
Member

@kpan2034 kpan2034 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch from 039cfb7 to a7b4dc8 Compare September 4, 2025 13:20
@fabriziomello
Copy link
Member Author

@mkindahl @kpan2034 can u folks please re-review this PR since I've changed the cagg_policy_incremental regression test to get rid of the scheduler mock cause it is so flaky.

@fabriziomello fabriziomello added the force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" label Sep 4, 2025
@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch from a7b4dc8 to 0fb5144 Compare September 4, 2025 17:57
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if you don't want to cap the beginning as well, and have tests for that.

Comment on lines +1045 to +1047
int64 invalidation_threshold =
invalidation_threshold_get(cagg->data.raw_hypertable_id, refresh_window.type);
refresh_window.end = MIN(invalidation_threshold, refresh_window.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to cap the refresh window at the beginning as well? In theory, the refresh window could be entirely after the threshold, in which case you will cap the end, but not the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I need to think more about it... probably what is necessary here is cap at the MAX(time_dimension) of the hypertable and not the threshold. The idea here is to optimize the batches creation to don't create unnecessary batches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I followed that, but my point is that if you have a refresh window 10:00-11:00 and you have a threshold at 09:00, if you just cap the refresh window at the end you will get the range 10:00-09:00, which is not a valid window at all and could lead to strange behavior later in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct but what I meant is that the invalidation threshold should not be the best information for this optmization. This is more or less what I want to optimize:

[-------------------) - refresh window
     [---------]      - original hypertable data
|XXXX|....|....|XXXX| - batches produced without align refresh window with ht data
                        we can see the first and last batch will do nothing
     [---------)      - align new refresh window within hypertable data
     |....|....|      - produce batches (1..N buckets) in the aligned refresh window

This PR requires more work because the original hypertable can be another CAgg in case of hierarchical and it can be realtime :-/

Cap the refresh window at the hypertable invalidation threshold to
don't produce unnecessary batches for refresh.
@fabriziomello fabriziomello force-pushed the cagg_consider_hypertable_invalidation_threshold_splitting_ranges branch from 0fb5144 to 7d78190 Compare September 7, 2025 22:22
@github-actions
Copy link

This pull request has been automatically marked as stale due to lack of activity. This pull request will be closed in 30 days.

@github-actions github-actions bot added the Stale label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Continuous Aggregate force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" Stale

3 participants