Skip to content

Fix stderr leak in Docker ES process detection#140701

Merged
ebarlas merged 3 commits intoelastic:mainfrom
ebarlas:fix-docker-tests-polling-bug
Jan 16, 2026
Merged

Fix stderr leak in Docker ES process detection#140701
ebarlas merged 3 commits intoelastic:mainfrom
ebarlas:fix-docker-tests-polling-bug

Conversation

@ebarlas
Copy link
Contributor

@ebarlas ebarlas commented Jan 15, 2026

Fixes a bug in the Docker packaging test infrastructure where bash error messages leak into process detection output, causing the wait loop to exit early with a false positive.

This bug is related to the following issues, but could also relate to previously muted docker tests:

The Problem

Test test072RunEsAsDifferentUserAndGroup was failing periodically because the certs directory wasn't found - but the real issue was that Elasticsearch never started at all.

The Clue

The test logs showed something suspicious:

[15:39:02,369] Ran: [...] exitCode = [0] stdout = [bash: line 1: /proc/95/cmdline: No such file or directory
0]

The script found a Java process (PID 95), but it subsequently failed to read /proc/95/cmdline. Bash printed an error, and the script returned 0 processes found.

But then the test immediately proceeded as if ES had started. Why?

The Bug

The wait loop checks:

if (psOutput.contains("1")) {
    isElasticsearchRunning = true;
    break;
}

The error message bash: line 1: ... contains "1". This matched the check, causing the loop to exit after 4 seconds instead of waiting 45.

Root Cause

The script uses input redirection:

cmdline=$(tr "\0" " " < /proc/$pid/cmdline 2>/dev/null)

When the file doesn't exist, bash opens it and prints the error. The 2>/dev/null only applies to tr, not to bash's redirection. With docker exec --tty, stderr mixes into the output.

The Fix

Use cat instead:

cmdline=$(cat /proc/$pid/cmdline 2>/dev/null | tr "\0" " ")

Now cat opens the file, so its 2>/dev/null properly suppresses the error.

@ebarlas ebarlas added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Jan 15, 2026
@ebarlas ebarlas closed this Jan 15, 2026
@ebarlas ebarlas reopened this Jan 15, 2026
Use cat to read the cmd file instead of input redirection.
@ebarlas ebarlas force-pushed the fix-docker-tests-polling-bug branch from 9f420ca to eefc3a0 Compare January 15, 2026 22:44
@ebarlas ebarlas closed this Jan 15, 2026
@ebarlas ebarlas reopened this Jan 15, 2026
@ebarlas ebarlas added the >bug label Jan 15, 2026
@ebarlas ebarlas self-assigned this Jan 15, 2026
@elasticsearchmachine
Copy link
Collaborator

Hi @ebarlas, I've created a changelog YAML for you.

@ebarlas ebarlas added auto-backport Automatically create backport pull requests when merged v9.3.1 v8.19.11 Team:Delivery Meta label for Delivery team labels Jan 15, 2026
@ebarlas ebarlas marked this pull request as ready for review January 15, 2026 23:08
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ebarlas ebarlas merged commit af3eebb into elastic:main Jan 16, 2026
36 checks passed
ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Jan 16, 2026
Use cat to read the cmd file instead of input redirection
ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Jan 16, 2026
Use cat to read the cmd file instead of input redirection
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.3
8.19
@ebarlas ebarlas added the v9.2.5 label Jan 16, 2026
@ebarlas
Copy link
Contributor Author

ebarlas commented Jan 16, 2026

Missed the label for 9.2 backport. Doing that via tool.

@ebarlas
Copy link
Contributor Author

ebarlas commented Jan 16, 2026

💚 All backports created successfully

Status Branch Result
9.2

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jan 16, 2026
Use cat to read the cmd file instead of input redirection
ebarlas added a commit that referenced this pull request Jan 16, 2026
Use cat to read the cmd file instead of input redirection
@ebarlas
Copy link
Contributor Author

ebarlas commented Jan 20, 2026

💚 All backports created successfully

Status Branch Result
9.1

Questions ?

Please refer to the Backport tool documentation

ebarlas added a commit to ebarlas/elasticsearch that referenced this pull request Jan 20, 2026
Use cat to read the cmd file instead of input redirection

(cherry picked from commit af3eebb)
ebarlas added a commit that referenced this pull request Jan 20, 2026
Use cat to read the cmd file instead of input redirection
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Use cat to read the cmd file instead of input redirection
jozala added a commit to jozala/elasticsearch that referenced this pull request Jan 22, 2026
Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in elastic#140701 and ready for
re-evaluation.
jozala added a commit that referenced this pull request Jan 23, 2026
…41098)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in #140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.
jozala added a commit to jozala/elasticsearch that referenced this pull request Jan 26, 2026
…astic#141098)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in elastic#140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
jozala added a commit to jozala/elasticsearch that referenced this pull request Jan 26, 2026
…astic#141098)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in elastic#140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
jozala added a commit to jozala/elasticsearch that referenced this pull request Jan 26, 2026
…astic#141098)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in elastic#140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
jozala added a commit to jozala/elasticsearch that referenced this pull request Jan 26, 2026
…astic#141098)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in elastic#140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Jan 26, 2026
…41098) (#141293)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in #140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Jan 26, 2026
…41098) (#141294)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in #140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Jan 26, 2026
…41098) (#141295)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in #140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Jan 26, 2026
…41098) (#141296)

* Unmute DockerTests in muted-tests.yml

Removed muting for multiple Docker-related integration tests in
`muted-tests.yml`, as they are likely resolved in #140701 and ready for
re-evaluation.

* Refactor authentication retry logic in ServerUtils

Removed unnecessary boolean flag for retrying on authentication failure

* Fix Docker log dumps

Fixed the problem with running `docker logs null` after test failure.
The problem was caused by calling the `removeContainer()` before
`dumpDebug()` because `dumpDebug()` is called from the @rule on the
base class and `removeContainer()` from @after method on
the child class.

* Fix conditional logic in ServerUtils for cluster health check

Adjusted logic to properly set the `started` flag only after a successful cluster health API response.

* Make the logging less verbose

This warning log with exception was very loud and sometimes hide the
real issue.
It has been changed to only log the exception message.

(cherry picked from commit bb91f1e)

# Conflicts:
#	muted-tests.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team v8.19.11 v9.1.11 v9.2.5 v9.3.1 v9.4.0

3 participants