Skip to content

x-pack/filebeat/input/streaming: apply configured timeout to OAuth2 token requests#50492

Merged
efd6 merged 1 commit into
elastic:mainfrom
efd6:49988-streaming
May 6, 2026
Merged

x-pack/filebeat/input/streaming: apply configured timeout to OAuth2 token requests#50492
efd6 merged 1 commit into
elastic:mainfrom
efd6:49988-streaming

Conversation

@efd6

@efd6 efd6 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Proposed commit message

x-pack/filebeat/input/streaming: apply configured timeout to OAuth2 token requests

The CrowdStrike FalconHose input constructed the OAuth2 HTTP client
with only the Transport from the auth client, dropping the Timeout
set by httpcommon.HTTPTransportSettings. Token endpoint requests
could hang indefinitely instead of respecting the configured timeout.

Apply the timeout via context.WithTimeout in rateLimitTransport
rather than http.Client.Timeout to avoid the deprecated
oauth2.Transport.CancelRequest path.

Also fix flaky test.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@efd6 efd6 self-assigned this May 5, 2026
@efd6 efd6 added Filebeat Filebeat bugfix Team:Security-Service Integrations Security Service Integrations Team backport-8.19 Automated backport to the 8.19 branch backport-9.3 Automated backport to the 9.3 branch backport-9.4 labels May 5, 2026
@botelastic botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 5, 2026
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.
@efd6

efd6 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

/test

1 similar comment
@efd6

efd6 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

/test

@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

TL;DR

The failing Buildkite job is caused by a timing-sensitive unit test, not an implementation regression: TestConcurrentRequestsExceedHighWater intermittently allows the second request through (200) before the first request has pushed inFlight above the high-water threshold on Windows. The immediate fix is to remove time.Sleep-based synchronization in this test and gate the second request on a deterministic signal.

Remediation

  • Replace the fixed delay in x-pack/filebeat/input/http_endpoint/handler_test.go:871-874 (time.Sleep(150 * time.Millisecond)) with deterministic coordination (e.g., channel/barrier from slowReader.Read after N bytes, or polling h.inFlight.Load() until >= c.HighWaterInFlight with timeout).
  • Keep the assertion at x-pack/filebeat/input/http_endpoint/handler_test.go:892, but ensure the second request is only sent after high-water is actually exceeded.
Investigation details

Root Cause

TestConcurrentRequestsExceedHighWater assumes this fixed timing guarantee:

  • first request started (handler_test.go:861-869)
  • sleep 150ms (handler_test.go:873)
  • second request sent (handler_test.go:877-885)

Admission control rejects only when current >= highWaterInFlight in ServeHTTP (x-pack/filebeat/input/http_endpoint/handler.go:127-154, specifically handler.go:136). On Windows CI scheduling, the first goroutine may not have accumulated enough bytes by the time the second request enters ServeHTTP, so it stays in accepting mode and returns 200.

Evidence

=== FAIL: x-pack/filebeat/input/http_endpoint TestConcurrentRequestsExceedHighWater (0.72s)
    handler_test.go:892:
        Error:       Not equal:
                     expected: 503
                     actual  : 200
  • Relevant code:
    • x-pack/filebeat/input/http_endpoint/handler_test.go:871-874
    • x-pack/filebeat/input/http_endpoint/handler_test.go:892
    • x-pack/filebeat/input/http_endpoint/handler.go:136

Verification

  • Reproduced analysis from Buildkite logs.
  • Linux local run (go test -run TestConcurrentRequestsExceedHighWater -count=10 ./input/http_endpoint) passed; this is consistent with a platform/timing-sensitive test race rather than a deterministic logic failure.

Follow-up

If you want to harden this further, mirror the deterministic pattern already used in x-pack/filebeat/input/http_endpoint/input_test.go:599-722 (TestConcurrentExceedMaxInFlight), which coordinates overlap explicitly instead of depending on wall-clock sleeps.

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@efd6 efd6 force-pushed the 49988-streaming branch from ab4ce85 to 3cdcaf7 Compare May 5, 2026 23:09
@efd6 efd6 marked this pull request as ready for review May 6, 2026 01:40
@efd6 efd6 requested a review from a team as a code owner May 6, 2026 01:40
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d4ab1c2c-7cc9-4b80-b6b3-f1caa0617e5c

📥 Commits

Reviewing files that changed from the base of the PR and between 3cdcaf7 and 045eaf1.

📒 Files selected for processing (5)
  • changelog/fragments/1777958918-49988-streaming.yaml
  • x-pack/filebeat/input/http_endpoint/handler_test.go
  • x-pack/filebeat/input/streaming/crowdstrike.go
  • x-pack/filebeat/input/streaming/crowdstrike_ratelimit.go
  • x-pack/filebeat/input/streaming/crowdstrike_test.go
✅ Files skipped from review due to trivial changes (1)
  • changelog/fragments/1777958918-49988-streaming.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • x-pack/filebeat/input/streaming/crowdstrike.go

📝 Walkthrough

Walkthrough

Adds a changelog fragment documenting a bug fix for CrowdStrike streaming input in Filebeat. Implements per-request timeout handling in the CrowdStrike rate-limit transport and wires the configured auth client timeout into follower initialization. Adds a unit test that verifies OAuth2 token requests respect the configured timeout and updates an existing HTTP endpoint test to wait on an in-flight counter instead of using a fixed sleep.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR fully addresses issue #49988 by adding timeout support to OAuth2 requests in CrowdStrike FalconHose input via rateLimitTransport context-based timeout.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing CrowdStrike OAuth2 timeout handling. Handler test fix for concurrent requests is a related flaky test fix mentioned in PR context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
x-pack/filebeat/input/http_endpoint/handler_test.go (1)

660-660: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Pipeline failing: missing //nolint:errcheck on existing (*handler) type assertions.

Line 846 (new code) has the nolint comment, but the identical pattern at line 660 (and lines 684, 709, 737, 763, 791) does not, causing the CI errcheck failure. Add the same comment to all of them.

📝 Example fix (apply to all occurrences)
-		handler := newHandler(ctx, c, nil, pub.Publish, nil, logp.NewLogger("test"), metrics).(*handler)
+		handler := newHandler(ctx, c, nil, pub.Publish, nil, logp.NewLogger("test"), metrics).(*handler) //nolint:errcheck // newHandler is statically known to return a *handler.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@x-pack/filebeat/input/http_endpoint/handler_test.go` at line 660, The CI
errcheck failure is caused by unchecked errors in the type assertion using
"(*handler)" after calls to newHandler; update each occurrence (the call pattern
"handler := newHandler(ctx, c, nil, pub.Publish, nil, logp.NewLogger(\"test\"),
metrics).(*handler)" at lines like 660, 684, 709, 737, 763, 791) to append the
nolint directive for errcheck (i.e., add "//nolint:errcheck" to the same line)
so the type assertion is exempted from the errcheck linter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@changelog/fragments/1777958918-49988-streaming.yaml`:
- Line 1: Replace the YAML entry key value "kind: feature" with "kind: bug-fix"
so the changelog fragment correctly classifies this PR as a bug fix; locate the
fragment that currently contains the string "kind: feature" and update it to
"kind: bug-fix" (ensure the rest of the YAML formatting remains unchanged).

---

Outside diff comments:
In `@x-pack/filebeat/input/http_endpoint/handler_test.go`:
- Line 660: The CI errcheck failure is caused by unchecked errors in the type
assertion using "(*handler)" after calls to newHandler; update each occurrence
(the call pattern "handler := newHandler(ctx, c, nil, pub.Publish, nil,
logp.NewLogger(\"test\"), metrics).(*handler)" at lines like 660, 684, 709, 737,
763, 791) to append the nolint directive for errcheck (i.e., add
"//nolint:errcheck" to the same line) so the type assertion is exempted from the
errcheck linter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: f847eddd-d0a3-45f1-96df-3a2d5c4ea346

📥 Commits

Reviewing files that changed from the base of the PR and between 9702d0a and 3cdcaf7.

📒 Files selected for processing (5)
  • changelog/fragments/1777958918-49988-streaming.yaml
  • x-pack/filebeat/input/http_endpoint/handler_test.go
  • x-pack/filebeat/input/streaming/crowdstrike.go
  • x-pack/filebeat/input/streaming/crowdstrike_ratelimit.go
  • x-pack/filebeat/input/streaming/crowdstrike_test.go
Comment thread changelog/fragments/1777958918-49988-streaming.yaml Outdated
…oken requests

The CrowdStrike FalconHose input constructed the OAuth2 HTTP client
with only the Transport from the auth client, dropping the Timeout
set by httpcommon.HTTPTransportSettings. Token endpoint requests
could hang indefinitely instead of respecting the configured timeout.

Apply the timeout via context.WithTimeout in rateLimitTransport
rather than http.Client.Timeout to avoid the deprecated
oauth2.Transport.CancelRequest path.

Also fix flaky test.
@efd6 efd6 merged commit fd8ae04 into elastic:main May 6, 2026
30 of 33 checks passed
efd6 added a commit that referenced this pull request May 6, 2026
…oken requests (#50492) (#50533)

The CrowdStrike FalconHose input constructed the OAuth2 HTTP client
with only the Transport from the auth client, dropping the Timeout
set by httpcommon.HTTPTransportSettings. Token endpoint requests
could hang indefinitely instead of respecting the configured timeout.

Apply the timeout via context.WithTimeout in rateLimitTransport
rather than http.Client.Timeout to avoid the deprecated
oauth2.Transport.CancelRequest path.

Also fix flaky test.

(cherry picked from commit fd8ae04)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
efd6 added a commit that referenced this pull request May 6, 2026
…oken requests (#50492) (#50531)

The CrowdStrike FalconHose input constructed the OAuth2 HTTP client
with only the Transport from the auth client, dropping the Timeout
set by httpcommon.HTTPTransportSettings. Token endpoint requests
could hang indefinitely instead of respecting the configured timeout.

Apply the timeout via context.WithTimeout in rateLimitTransport
rather than http.Client.Timeout to avoid the deprecated
oauth2.Transport.CancelRequest path.

Also fix flaky test.

(cherry picked from commit fd8ae04)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
efd6 added a commit that referenced this pull request May 6, 2026
…oken requests (#50492) (#50532)

The CrowdStrike FalconHose input constructed the OAuth2 HTTP client
with only the Transport from the auth client, dropping the Timeout
set by httpcommon.HTTPTransportSettings. Token endpoint requests
could hang indefinitely instead of respecting the configured timeout.

Apply the timeout via context.WithTimeout in rateLimitTransport
rather than http.Client.Timeout to avoid the deprecated
oauth2.Transport.CancelRequest path.

Also fix flaky test.

(cherry picked from commit fd8ae04)

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-9.3 Automated backport to the 9.3 branch backport-9.4 bugfix Filebeat Filebeat Team:Security-Service Integrations Security Service Integrations Team

2 participants