Skip to content

Fix race in libbeat pipeline shutdown to ensure all events are gone#50625

Merged
blakerouse merged 5 commits into
elastic:mainfrom
blakerouse:fix-pipeline-client-shutdown
May 13, 2026
Merged

Fix race in libbeat pipeline shutdown to ensure all events are gone#50625
blakerouse merged 5 commits into
elastic:mainfrom
blakerouse:fix-pipeline-client-shutdown

Conversation

@blakerouse

@blakerouse blakerouse commented May 12, 2026

Copy link
Copy Markdown
Contributor

Proposed commit message

Acquire and release the client mutex in Close before calling signalClose to ensure any in-progress Publish call has finished incrementing the pending event counter. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

None

Related issues

blakerouse and others added 2 commits May 12, 2026 10:08
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@blakerouse blakerouse self-assigned this May 12, 2026
@blakerouse blakerouse requested a review from a team as a code owner May 12, 2026 14:12
@blakerouse blakerouse requested review from AndersonQ and orestisfl May 12, 2026 14:12
@blakerouse blakerouse added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-active-all Automated backport with mergify to all the active branches labels May 12, 2026
@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 12, 2026
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@botelastic botelastic Bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 4a399ed6-09d8-4422-95c2-bd4cdce445df

📥 Commits

Reviewing files that changed from the base of the PR and between 15c34c0 and 7f6e050.

📒 Files selected for processing (1)
  • libbeat/publisher/pipeline/client.go

📝 Walkthrough

Walkthrough

This PR fixes a race in the libbeat pipeline client's shutdown. client.Close now performs a brief mutex lock/unlock after onClosing to serialize with concurrent Publish calls so in-flight publishes register their pending events before Close signals and waits for ACKs. A regression test (TestCloseWaitsForInFlightPublish) verifies Close blocks until the in-flight event is ACKed. A changelog fragment documents the bug-fix.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR successfully addresses all objectives from issue #49390: synchronizes Close with Publish via mutex acquisition, prevents race between isOpen check and counter increment, ensures all in-flight events are waited for before shutdown.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the pipeline client shutdown race: mutex synchronization in client.go, regression test for the specific race condition, and changelog fragment documenting the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 12, 2026
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert requested a review from leehinman May 12, 2026 16:39
Comment thread libbeat/publisher/pipeline/client.go Outdated
@blakerouse blakerouse merged commit 3e6578e into elastic:main May 13, 2026
200 of 203 checks passed
@blakerouse blakerouse deleted the fix-pipeline-client-shutdown branch May 13, 2026 20:54
@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport 9.4 9.3 8.19

@mergify

mergify Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor
v1v added a commit to v1v/beats that referenced this pull request May 14, 2026
* upstream:
  x-pack/filebeat/input/entityanalytics/provider/okta: add minimal-state provider (elastic#50685)
  Fix flakiness on TestQueueDoesNotReplayLastEventAfterRestart (elastic#50675)
  Fix race in libbeat pipeline shutdown to ensure all events are gone (elastic#50625)
belimawr added a commit that referenced this pull request May 15, 2026
…e all events are gone (#50677)

Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent.

(cherry picked from commit 3e6578e)

---------

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
belimawr added a commit that referenced this pull request May 15, 2026
…50625) (#50678)

Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent.

(cherry picked from commit 3e6578e)

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
belimawr added a commit that referenced this pull request May 18, 2026
…50625) (#50679)

Acquire mutex before setting isOpen to ensure any in-progress Publish call has finished before closing. Without this, signalClose could observe zero pending events and complete shutdown immediately while a Publish was still in-flight between the isOpen check and AddEvent.

(cherry picked from commit 3e6578e)

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

4 participants