Skip to content

fix: Fix lock expiration metric#18492

Open
linliu-code wants to merge 1 commit into
apache:masterfrom
linliu-code:fix_wrong_metric
Open

fix: Fix lock expiration metric#18492
linliu-code wants to merge 1 commit into
apache:masterfrom
linliu-code:fix_wrong_metric

Conversation

@linliu-code

@linliu-code linliu-code commented Apr 11, 2026

Copy link
Copy Markdown
Collaborator

Describe the issue this Pull Request addresses

#18493
After a successful lock renewal in StorageBasedLockProvider, the lock.expiration.deadline metric was computed using oldExpirationMs (the expiration of the previous lease) instead of lockExpirationMs (the expiration of the newly renewed lease). This caused the metric to report the remaining time on the old lease rather than the new one, making it appear that the lock was about to expire even though it had just been extended.

The adjacent log message correctly uses oldExpirationMs — its purpose is to report how much buffer remained before the old lease would have expired, which is a heartbeat health signal. The metric, on the other hand, should reflect the current state of the lock after renewal.

Summary and Changelog

Use lockExpirationMs to calculate the lock.expiration.deadline metric.

Impact

The lock.expiration.deadline metric now accurately reports how much time remains until the lock actually expires after renewal. No behavioral change to locking logic — only the reported metric value is corrected.

Risk Level

Low. Single-line change affecting only metric reporting. No change to lock acquisition, renewal, or release logic.

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable
@github-actions github-actions Bot added the size:XS PR with lines of changes in <= 10 label Apr 12, 2026
@linliu-code linliu-code marked this pull request as ready for review April 12, 2026 03:59

@yihua yihua left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM — this is a clean, minimal fix that corrects the metric to reflect the newly renewed lease expiration rather than the old one. The change is consistent with how the same metric is populated in the initial lock acquisition path, and the log message being left on oldExpirationMs is intentional and correct.

@yihua yihua left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM — clean, precise fix that correctly reports the renewed lock's expiration time in the metric rather than the old (nearly-expired) lease's deadline. The updated log message also nicely captures both the heartbeat health signal and the new TTL in one line.

@linliu-code linliu-code force-pushed the fix_wrong_metric branch 2 times, most recently from dc464e2 to 7bda459 Compare June 9, 2026 13:17
@github-actions github-actions Bot added size:S PR with lines of changes in (10, 100] and removed size:XS PR with lines of changes in <= 10 labels Jun 9, 2026
@linliu-code linliu-code requested a review from yihua June 9, 2026 16:02

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the fix! This corrects the lock.expiration.deadline metric to reflect the newly renewed lease rather than the old one, and the regression test cleanly demonstrates the distinction. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor naming nit in the new test — successData/successLockFile could mirror oldData/oldLockFile more explicitly.

cc @yihua

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.27%. Comparing base (f35b69c) to head (a375e27).
⚠️ Report is 174 commits behind head on master.

Files with missing lines Patch % Lines
...ent/transaction/lock/StorageBasedLockProvider.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18492      +/-   ##
============================================
- Coverage     68.85%   68.27%   -0.58%     
- Complexity    28414    29510    +1096     
============================================
  Files          2475     2542      +67     
  Lines        136425   142604    +6179     
  Branches      16569    17785    +1216     
============================================
+ Hits          93929    97362    +3433     
- Misses        34981    37235    +2254     
- Partials       7515     8007     +492     
Flag Coverage Δ
common-and-other-modules 44.78% <50.00%> (+0.29%) ⬆️
hadoop-mr-java-client 44.71% <0.00%> (-0.07%) ⬇️
spark-client-hadoop-common 48.05% <0.00%> (-0.37%) ⬇️
spark-java-tests 48.78% <0.00%> (-0.66%) ⬇️
spark-scala-tests 44.85% <0.00%> (-0.44%) ⬇️
utilities 37.26% <0.00%> (-0.79%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ent/transaction/lock/StorageBasedLockProvider.java 88.23% <50.00%> (-2.33%) ⬇️

... and 449 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
(int) (oldExpirationMs - getCurrentEpochMs())));
logger.info("Owner {}: Lock renewal successful. The renewal completes {} ms before expiration for lock {}.",
ownerId, oldExpirationMs - getCurrentEpochMs(), lockFilePath);
(int) (lockExpirationMs - getCurrentEpochMs())));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above sets the held lock from the returned file (setLock(currentLock.getRight().get())), but the metric is sourced from the locally computed lockExpirationMs instead. The acquisition path derives this same metric from the returned file's getValidUntilMs() right after its own setLock, so the two paths now disagree on where the deadline comes from. The values are equal today because the storage clients echo back the written StorageLockData, so this is readability, not correctness: consider hoisting currentLock.getRight().get() into a local and using its getValidUntilMs() for both setLock and the metric, matching the acquisition path.

@linliu-code linliu-code force-pushed the fix_wrong_metric branch 2 times, most recently from 0d2f786 to b0c6add Compare July 1, 2026 07:21
@hudi-bot

hudi-bot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

6 participants