Skip to content

Convert BytesTransportResponse when proxying response from/to local node#135873

Merged
javanna merged 12 commits intoelastic:mainfrom
javanna:fix/proxy_direct_response_convert
Oct 6, 2025
Merged

Convert BytesTransportResponse when proxying response from/to local node#135873
javanna merged 12 commits intoelastic:mainfrom
javanna:fix/proxy_direct_response_convert

Conversation

@javanna
Copy link
Contributor

@javanna javanna commented Oct 2, 2025

#127112 introduced BytesTransportResponse to be used in search batched execution, so that NodeQueryResponse could be written as BytesTransportResponse as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because DirectResponseChannel skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the response, and conditionally converting it in the ProxyRequestHandler.

There may be better ways to fix this problem, let's use this draft to align on the problem and agree on possible solutions. I am happy to iterate on it, as this currently blocks merging #135549 due to the serialization change it introduces.

Note that search batched executed is still under a feature flag, and disabled by default, so this bug does not currently affect our users.

execution, to a BytesTransportResponse as opposed to materializing the
response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating
node is on a different version than the proxy node, the response will fail
to deserialize in the coord node because it was written with the version of the
proxy node as opposed to that of the coord (target) node. This is because
DirectResponseChannel does not read and write such response, which would lead
to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to
write the response, and conditionally converting it in the ProxyRequestHandler.
@javanna javanna added >bug :Distributed/Network Http and internode communication implementations :Search Foundations/Search Catch all for Search Foundations v9.3.0 labels Oct 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna javanna removed the :Search Foundations/Search Catch all for Search Foundations label Oct 2, 2025
@javanna javanna marked this pull request as ready for review October 3, 2025 10:09
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Oct 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves some tests in TransportActionProxyTests showing that we do/don't re-serialize the response in these cases.

javanna and others added 2 commits October 3, 2025 13:09
…onProxy.java

Co-authored-by: David Turner <david.turner@elastic.co>
try {
channel.sendResponse(convertedResponse);
} finally {
convertedResponse.decRef();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd initially forgotten this, can you double check that this is what I need to be doing please?

convertedResponse.decRef();
}
} catch (IOException e) {
throw new UncheckedIOException(e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I throw or send the exception as a response? I am assuming that exceptions thrown are already caught and sent as responses, but good to double check.

public void writeTo(StreamOutput out) throws IOException {
out.writeString(targetNode);
if (out.getTransportVersion().supports(transportVersion1)) {
out.writeBoolean(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reproduces the problem in the tests I added, which is fixed with the conversion this change introduces.

"internal:test",
cancellable,
// For a proxy node proxying to itself, the response is sent directly, without it being read by the proxy layer
r -> { throw new AssertionError(); },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this diff is misleading! I only changed this line in testSendLocalRequest

@javanna
Copy link
Contributor Author

javanna commented Oct 3, 2025

Thanks for the suggestion, I added tests @DaveCTurner.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@javanna javanna merged commit cebc1a8 into elastic:main Oct 6, 2025
34 checks passed
@javanna javanna deleted the fix/proxy_direct_response_convert branch October 6, 2025 12:01
@javanna javanna added v8.19.6 and removed v8.19.6 labels Oct 6, 2025
benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Nov 14, 2025
…ode (elastic#135873)

elastic#127112 introduced `BytesTransportResponse` to be used in search batched execution, so that `NodeQueryResponse` could be written as `BytesTransportResponse` as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because `DirectResponseChannel` skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the binary response, and conditionally converting it in the `ProxyRequestHandler` when the version don't align.
benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Nov 14, 2025
…ode (elastic#135873)

elastic#127112 introduced `BytesTransportResponse` to be used in search batched execution, so that `NodeQueryResponse` could be written as `BytesTransportResponse` as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because `DirectResponseChannel` skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the binary response, and conditionally converting it in the `ProxyRequestHandler` when the version don't align.
benchaplin added a commit that referenced this pull request Nov 14, 2025
…ocal node (#135873) (#138070)

#127112 introduced `BytesTransportResponse` to be used in search batched execution, so that `NodeQueryResponse` could be written as `BytesTransportResponse` as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because `DirectResponseChannel` skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the binary response, and conditionally converting it in the `ProxyRequestHandler` when the version don't align.

---------

Co-authored-by: Luca Cavanna <javanna@apache.org>
benchaplin added a commit that referenced this pull request Nov 17, 2025
…ocal node (#135873) (#138071)

#127112 introduced `BytesTransportResponse` to be used in search batched execution, so that `NodeQueryResponse` could be written as `BytesTransportResponse` as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because `DirectResponseChannel` skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the binary response, and conditionally converting it in the `ProxyRequestHandler` when the version don't align.

---------

Co-authored-by: Luca Cavanna <javanna@apache.org>
@benchaplin
Copy link
Contributor

I've backported this to 9.1/9.2 to support backports of additional transport versions for BytesTransportResponse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Network Http and internode communication implementations Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. v9.1.8 v9.2.2 v9.3.0

4 participants