Fix race in libbeat pipeline shutdown to ensure all events are gone#50625
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
🤖 GitHub commentsJust comment with:
|
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR fixes a race in the libbeat pipeline client's shutdown. client.Close now performs a brief mutex lock/unlock after onClosing to serialize with concurrent Publish calls so in-flight publishes register their pending events before Close signals and waits for ACKs. A regression test (TestCloseWaitsForInFlightPublish) verifies Close blocks until the in-flight event is ACKed. A changelog fragment documents the bug-fix. 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
|
@Mergifyio backport 9.4 9.3 8.19 |
✅ Backports have been createdDetails
|
* upstream: x-pack/filebeat/input/entityanalytics/provider/okta: add minimal-state provider (elastic#50685) Fix flakiness on TestQueueDoesNotReplayLastEventAfterRestart (elastic#50675) Fix race in libbeat pipeline shutdown to ensure all events are gone (elastic#50625)
…e all events are gone (#50677) Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent. (cherry picked from commit 3e6578e) --------- Co-authored-by: Blake Rouse <blake.rouse@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
…50625) (#50678) Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent. (cherry picked from commit 3e6578e) Co-authored-by: Blake Rouse <blake.rouse@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
…50625) (#50679) Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent. (cherry picked from commit 3e6578e) Co-authored-by: Blake Rouse <blake.rouse@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Proposed commit message
Acquire and release the client mutex in
Closebefore callingsignalCloseto ensure any in-progressPublishcall has finished incrementing the pending event counter. Without this,signalClosecould observe zero pending events and complete shutdown immediately while aPublishwas still in-flight between theisOpencheck andAddEvent.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
None
Related issues