Skip to content

[libbeat/output/kafka]: fix panic on Publish after Close#46446

Merged
AndersonQ merged 14 commits intoelastic:mainfrom
AndersonQ:46109-kafka-output-panic
Sep 19, 2025
Merged

[libbeat/output/kafka]: fix panic on Publish after Close#46446
AndersonQ merged 14 commits intoelastic:mainfrom
AndersonQ:46109-kafka-output-panic

Conversation

@AndersonQ
Copy link
Member

Proposed commit message

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the underlying Kafka producer's input channel could be closed before the call finished sending all messages in the batch. The subsequent attempt to send the remaining messages on this closed channel would result in a panic.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

  • n/a

How to test this PR locally

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2025

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Sep 8, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @AndersonQ? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@AndersonQ AndersonQ force-pushed the 46109-kafka-output-panic branch from 2ba2950 to f82b26e Compare September 8, 2025 15:47
@mergify
Copy link
Contributor

mergify bot commented Sep 8, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 46109-kafka-output-panic upstream/46109-kafka-output-panic
git merge upstream/main
git push upstream 46109-kafka-output-panic
@AndersonQ AndersonQ added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Sep 9, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 9, 2025
@AndersonQ AndersonQ added needs_team Indicates that the issue/PR needs a Team:* label bugfix labels Sep 9, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 9, 2025
@AndersonQ AndersonQ force-pushed the 46109-kafka-output-panic branch from 71a33cc to cf2cb99 Compare September 9, 2025 09:57
@mergify
Copy link
Contributor

mergify bot commented Sep 9, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 46109-kafka-output-panic upstream/46109-kafka-output-panic
git merge upstream/main
git push upstream 46109-kafka-output-panic
@AndersonQ AndersonQ requested a review from Copilot September 9, 2025 14:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Fixes a race condition in the Kafka output client that could cause a panic during shutdown when Publish calls were in-flight while the client was closing.

  • Adds proper channel close detection in the Publish method to prevent sending on closed channels
  • Implements graceful event dropping with logging when the client is shutting down
  • Adds a comprehensive test to reproduce and verify the fix for the race condition

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
libbeat/outputs/kafka/client.go Adds select statement to handle closed channel during publish operations
libbeat/outputs/kafka/client_test.go Adds test case to reproduce the shutdown panic scenario
CHANGELOG.next.asciidoc Documents the bug fix in the changelog

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@AndersonQ AndersonQ force-pushed the 46109-kafka-output-panic branch from cf2cb99 to f45f415 Compare September 9, 2025 14:03
A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.
@AndersonQ AndersonQ force-pushed the 46109-kafka-output-panic branch from f45f415 to f6a6e31 Compare September 9, 2025 14:04
@AndersonQ AndersonQ changed the title [wip] libbeat/output/kafka: fix panic on Publish after Close Sep 9, 2025
@AndersonQ AndersonQ marked this pull request as ready for review September 9, 2025 16:13
@AndersonQ AndersonQ requested a review from a team as a code owner September 9, 2025 16:13
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz cmacknz added the backport-active-all Automated backport with mergify to all the active branches label Sep 9, 2025
@AndersonQ AndersonQ requested a review from Copilot September 15, 2025 08:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mergify
Copy link
Contributor

mergify bot commented Sep 15, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 46109-kafka-output-panic upstream/46109-kafka-output-panic
git merge upstream/main
git push upstream 46109-kafka-output-panic
Copy link
Contributor

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@AndersonQ AndersonQ requested a review from a team as a code owner September 19, 2025 09:29
@mergify
Copy link
Contributor

mergify bot commented Sep 19, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 46109-kafka-output-panic upstream/46109-kafka-output-panic
git merge upstream/main
git push upstream 46109-kafka-output-panic
@AndersonQ AndersonQ enabled auto-merge (squash) September 19, 2025 11:18
@AndersonQ AndersonQ merged commit bc05117 into elastic:main Sep 19, 2025
208 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport 8.18 8.19 9.0 9.1

@mergify
Copy link
Contributor

mergify bot commented Sep 19, 2025

mergify bot pushed a commit that referenced this pull request Sep 19, 2025
* libbeat/output/kafka: fix panic on Publish after Close

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)
mergify bot pushed a commit that referenced this pull request Sep 19, 2025
* libbeat/output/kafka: fix panic on Publish after Close

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)
mergify bot pushed a commit that referenced this pull request Sep 19, 2025
* libbeat/output/kafka: fix panic on Publish after Close

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)
mergify bot pushed a commit that referenced this pull request Sep 19, 2025
* libbeat/output/kafka: fix panic on Publish after Close

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)
AndersonQ added a commit that referenced this pull request Sep 22, 2025
…after Close (#46710)

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)

* fix changelog

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
AndersonQ added a commit that referenced this pull request Sep 22, 2025
…after Close (#46709)

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)

* fix changelog
* fix test

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
AndersonQ added a commit that referenced this pull request Sep 22, 2025
…fter Close (#46712)

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)

* fix changelog

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
AndersonQ added a commit that referenced this pull request Sep 22, 2025
…fter Close (#46711)

A race condition during client shutdown could cause a panic.

If a `Publish` call was in-flight while the client was closing, the
underlying Kafka producer's input channel could be closed before the
call finished sending all messages in the batch. The subsequent attempt
to send the remaining messages on this closed channel would result in a
panic.

(cherry picked from commit bc05117)

* fix changelog
* fix test

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bugfix Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

6 participants