Skip to content

Fix Filebeat's race condition on shutdown#46331

Merged
belimawr merged 4 commits intoelastic:mainfrom
belimawr:fix-shutdown-deadlock
Sep 4, 2025
Merged

Fix Filebeat's race condition on shutdown#46331
belimawr merged 4 commits intoelastic:mainfrom
belimawr:fix-shutdown-deadlock

Conversation

@belimawr
Copy link
Contributor

@belimawr belimawr commented Aug 29, 2025

Note for reviewers

The detailed description of the issue this PR solves can be found at #45034 (comment)

Proposed commit message

When Filebeat fails to start because of a unknown input, it can get into a deadlock state and never exit. This commit attempts to fix it.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

That's a non-deterministic problem, so testing is about running Filebeat over and over again and ensuring it never hangs during shutdown.

You can file the reproducing steps listed in the linked issue below, to run Filebeat over and over again, use the following script:

#!/bin/bash

# while "$@"; do :; done
counter=0
while true; do
    "$@"
    let counter++
    echo $counter
    true
done

Assuming the script is named rununtilfail and is ~/bin/rununtilfail , you can run:

~/bin/rununtilfail ./filebeat -c ./filebeat-test.yml -e

Let it run for a few minutes, if Filebeat never hangs, then it's very likely the issue has been solved.

On my tests, it usually took less than 100 runs for Filebeat to hang, in more extreme cases it would take a couple thousand tries for Filebeat to hang.

Related issues

## Use cases
## Screenshots
## Logs

When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.
@belimawr belimawr self-assigned this Aug 29, 2025
@belimawr belimawr added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team bugfix labels Aug 29, 2025
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 29, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Aug 29, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@mauri870
Copy link
Member

mauri870 commented Sep 4, 2025

Reminder for me to check if this somehow fixes #43137.

@belimawr belimawr changed the title [PoC] Fix Filebeat's race condition on shutdown Sep 4, 2025
@belimawr belimawr added the backport-active-all Automated backport with mergify to all the active branches label Sep 4, 2025
@belimawr belimawr marked this pull request as ready for review September 4, 2025 18:49
@belimawr belimawr requested a review from a team as a code owner September 4, 2025 18:49
@belimawr belimawr requested a review from leehinman September 4, 2025 18:49
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr belimawr merged commit 3ee3201 into elastic:main Sep 4, 2025
49 of 52 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

@Mergifyio backport 8.17 8.18 8.19 9.0 9.1

@mergify
Copy link
Contributor

mergify bot commented Sep 4, 2025

backport 8.17 8.18 8.19 9.0 9.1

✅ Backports have been created

Details
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)
mergify bot pushed a commit that referenced this pull request Sep 4, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)
belimawr added a commit that referenced this pull request Sep 5, 2025
…6394)

When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)

---------

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
belimawr added a commit that referenced this pull request Sep 5, 2025
…6395)

When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)

---------

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
belimawr added a commit that referenced this pull request Sep 5, 2025
)

When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)

---------

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
belimawr added a commit that referenced this pull request Sep 5, 2025
When Filebeat fails to start because of a unknown input, it can get
into a deadlock state and never exit. This commit attempts to fix it.

(cherry picked from commit 3ee3201)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
mauri870 added a commit to mauri870/beats that referenced this pull request Oct 16, 2025
This test was skipped in elastic#42780
due to elastic#43137.

Since elastic#46331 fixed the underlying issue,
we can re-enable the test.
mauri870 added a commit that referenced this pull request Oct 22, 2025
)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.
mergify bot pushed a commit that referenced this pull request Oct 22, 2025
)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)
mergify bot pushed a commit that referenced this pull request Oct 22, 2025
)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)
mergify bot pushed a commit that referenced this pull request Oct 22, 2025
)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)
mergify bot pushed a commit that referenced this pull request Oct 22, 2025
)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)
mauri870 added a commit that referenced this pull request Oct 22, 2025
) (#47269)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
mauri870 added a commit that referenced this pull request Oct 22, 2025
) (#47270)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
mauri870 added a commit that referenced this pull request Oct 22, 2025
) (#47268)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
mauri870 added a commit that referenced this pull request Oct 22, 2025
) (#47271)

This test was skipped in #42780
due to #43137.

Since #46331 fixed the underlying issue,
we can re-enable the test.

(cherry picked from commit e744a82)

Co-authored-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
andrzej-stencel pushed a commit to andrzej-stencel/beats that referenced this pull request Dec 1, 2025
…stic#47163)

This test was skipped in elastic#42780
due to elastic#43137.

Since elastic#46331 fixed the underlying issue,
we can re-enable the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches bugfix Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

4 participants