[Transform] Fix transform validation to reject PUT and _start when user lacks remote index permissions#142403
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Pull request overview
Fixes transform source validation so that unauthorized cross-cluster (remote) source indices cause PUT _transform and _start to fail consistently (matching _preview), with clearer diagnostics.
Changes:
- Add
SourceAccessDiagnosticsto detect security-related CCS/shard failures and return actionable error messages. - Update
preview()andvalidateQuery()to use diagnostics when aggregations arenull. - Expand multi-cluster security YAML tests and add unit tests for the new diagnostics behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnostics.java | Adds centralized diagnostics for permission vs missing-index cases when aggregations are null. |
| x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/AbstractCompositeAggFunction.java | Uses diagnostics in preview() and adds null-aggregation validation in validateQuery(). |
| x-pack/plugin/transform/src/test/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnosticsTests.java | Adds unit coverage for cluster/shard security failures and fallback behavior. |
| x-pack/plugin/transform/qa/multi-cluster-tests-with-security/src/test/resources/rest-api-spec/test/multi_cluster/80_transform.yml | Updates/extends integration tests to assert PUT and _start reject unauthorized remote sources. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...c/main/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnostics.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnostics.java
Outdated
Show resolved
Hide resolved
...t/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnosticsTests.java
Show resolved
Hide resolved
|
Here's the PR comment text: Historical context: PR #95318 and its revert #95562While investigating this fix, I discovered that the same null-aggregation check in What happened: PR #95318 added an unconditional The key insight is that How this PR avoids the same regression: Instead of unconditionally failing on null aggregations, Open question: Whether this approach catches all unauthorized-remote-index scenarios depends on how cc @elastic/fleet |
|
Hi @valeriy42, I've created a changelog YAML for you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (ShardSearchFailure failure : cluster.getFailures()) { | ||
| if (isSecurityFailure(failure)) { | ||
| return "User lacks the required permissions to read source indices on cluster [" + alias + "]."; | ||
| } | ||
| } |
There was a problem hiding this comment.
cluster.getFailures() can be null for clusters without recorded failures, which would throw an NPE during iteration. Add a null/empty check before iterating (or iterate over an empty list when failures is null) to make this safe for all SearchResponse.Cluster instances.
| if (response.getAggregations() == null) { | ||
| String diagnosis = SourceAccessDiagnostics.diagnoseSourceAccessFailure(response); | ||
| if (diagnosis.equals(SourceAccessDiagnostics.SOURCE_INDICES_MISSING) == false) { | ||
| listener.onFailure(new ValidationException().addValidationError(diagnosis)); | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Control flow is determined by comparing the returned message string to SOURCE_INDICES_MISSING, which is brittle (message wording becomes part of the API contract). Consider changing SourceAccessDiagnostics to return a structured result (e.g., Optional<String> for a detected security message, or an enum + message), so validateQuery() can branch on an explicit signal rather than string equality.
...c/main/java/org/elasticsearch/xpack/transform/transforms/common/SourceAccessDiagnostics.java
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
...ter-tests-with-security/src/test/resources/rest-api-spec/test/multi_cluster/80_transform.yml
Show resolved
Hide resolved
Co-authored-by: Pat Whelan <pat.whelan@elastic.co>
💔 Backport failed
You can use sqren/backport to manually backport by running |
…er lacks remote index permissions (elastic#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR elastic#95318 to be reverted in elastic#95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes elastic#95367
…er lacks remote index permissions (elastic#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR elastic#95318 to be reverted in elastic#95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes elastic#95367 (cherry picked from commit 0e44984) # Conflicts: # x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/AbstractCompositeAggFunction.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…er lacks remote index permissions (elastic#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR elastic#95318 to be reverted in elastic#95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes elastic#95367 (cherry picked from commit 0e44984) # Conflicts: # x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/AbstractCompositeAggFunction.java
…hen user lacks remote index permissions (#142403) (#142454) * [Transform] Fix transform validation to reject PUT and _start when user lacks remote index permissions (#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR #95318 to be reverted in #95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes #95367 (cherry picked from commit 0e44984) # Conflicts: # x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/AbstractCompositeAggFunction.java * Fix SearchResponse.Cluster constructor arity in SourceAccessDiagnosticsTests * checkstyle
…hen user lacks remote index permissions (#142403) (#142430) * [Transform] Fix transform validation to reject PUT and _start when user lacks remote index permissions (#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR #95318 to be reverted in #95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes #95367 * Fix SearchResponse.Cluster constructor arity in SourceAccessDiagnosticsTests * checkstyle
…er lacks remote index permissions (elastic#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR elastic#95318 to be reverted in elastic#95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes elastic#95367
…when user lacks remote index permissions (#142403) (#142455) * [Transform] Fix transform validation to reject PUT and _start when user lacks remote index permissions (#142403) When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the _preview API correctly fails -- but PUT _transform and _start silently succeed, allowing unauthorized transforms to be created and started. The root cause is that validateQuery in AbstractCompositeAggFunction only checks the response status code, which is OK even when IndicesOptions.LENIENT_EXPAND_OPEN causes unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlike preview(), validateQuery() never checks for that condition. This PR introduces a SourceAccessDiagnostics class that inspects the SearchResponse for security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters with ElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZED status). A null-aggregation check is added to validateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR #95318 to be reverted in #95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Since defer_validation only defers from PUT to _start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. The preview() method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise. The multi-cluster YAML integration tests are updated to verify that both PUT _transform and _start now reject unauthorized remote transforms. A new test case creates a transform with defer_validation: true and confirms that _start catches the permission issue. Unit tests for SourceAccessDiagnostics cover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures. Fixes #95367 (cherry picked from commit 0e44984) # Conflicts: # x-pack/plugin/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/common/AbstractCompositeAggFunction.java * fix compilation error * Add diagnostics for remote CCS clusters with zero shards Enhance the SourceAccessDiagnostics class to identify remote CCS clusters that return zero shards due to permission issues. This update includes a new method to check for such scenarios and updates the documentation accordingly. Additionally, new unit tests are added to verify the correct behavior when accessing remote clusters with insufficient permissions, ensuring that appropriate error messages are returned. This change improves the clarity of diagnostics related to security exceptions in cross-cluster searches. * fix unit test specifics for 8.19 * Update transform configuration in multi-cluster test to include defer_validation and modify description
When a transform is configured with a remote (cross-cluster) source index and the user lacks permissions to access it, the
_previewAPI correctly fails -- butPUT _transformand_startsilently succeed, allowing unauthorized transforms to be created and started. The root cause is thatvalidateQueryinAbstractCompositeAggFunctiononly checks the response status code, which isOKeven whenIndicesOptions.LENIENT_EXPAND_OPENcauses unauthorized indices to be silently ignored. The search returns null aggregations in this case, but unlikepreview(),validateQuery()never checks for that condition.This PR introduces a
SourceAccessDiagnosticsclass that inspects theSearchResponsefor security-related failures at both the CCS cluster level (SKIPPED/FAILED clusters withElasticsearchSecurityException) and the shard level (FORBIDDEN/UNAUTHORIZEDstatus). A null-aggregation check is added tovalidateQuery(), but -- critically -- it only rejects the request when a security failure is positively identified. When no security failure is found, validation passes through silently. This distinction avoids the regression that caused PR #95318 to be reverted in #95562: that earlier change unconditionally failed on null aggregations, which broke integrations (such as Elastic Defend) that create and start transforms with wildcard source patterns before any matching indices exist. Sincedefer_validationonly defers from PUT to_start, there was no way for those integrations to bypass the check. Our approach preserves backward compatibility for the empty-indices case while catching the unauthorized-remote-index case. Thepreview()method also delegates to the same diagnostics class, so all three APIs now produce consistent, actionable error messages when a security failure is detected, falling back to the original generic message otherwise.The multi-cluster YAML integration tests are updated to verify that both
PUT _transformand_startnow reject unauthorized remote transforms. A new test case creates a transform withdefer_validation: trueand confirms that_startcatches the permission issue. Unit tests forSourceAccessDiagnosticscover cluster-level SKIPPED/FAILED scenarios, shard-level security exceptions, FORBIDDEN/UNAUTHORIZED status codes, and the fallback to the generic message for non-security failures.Fixes #95367