Do not serialize EsIndex in plan#119580
Conversation
|
Hi @idegtiarenko, I've created a changelog YAML for you. |
| assertThat(plan, instanceOf(EsRelation.class)); | ||
| EsRelation esRelation = (EsRelation) plan; | ||
| assertThat(esRelation.output(), equalTo(NO_FIELDS)); | ||
| assertTrue(esRelation.index().mapping().isEmpty()); |
There was a problem hiding this comment.
Not sure if it is possible to replace this assertion in assertEmptyEsRelation with any alternative?
There was a problem hiding this comment.
I think the assertion on output above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsSourceExec.java
costin
left a comment
There was a problem hiding this comment.
Great find - left a question around the usage of IndexModes map; not clear why that has to be serialized?
| EsIndex index, | ||
| String indexName, | ||
| IndexMode indexMode, | ||
| Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
It is used in LocalExecutionPlanner:
There was a problem hiding this comment.
If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.
Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?
Happy to assist with moving that to an earlier stage!
There was a problem hiding this comment.
There already is an index mode check in the Analyzer:
There was a problem hiding this comment.
I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.
What cannot be removed is the line
var entry = indicesWithModes.entrySet().iterator().next();
which, in case of aliases, contains the actual lookup index name in the entry's key.
Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)
I'll confirm with @idegtiarenko if this is what we want to do.
There was a problem hiding this comment.
Correct. IndexName could actually be alias or pattern, while map entries include resolved index names
There was a problem hiding this comment.
The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?
There was a problem hiding this comment.
I am not sure I would be able to pick up followup immediately.
Also keeping this updated required multiple conflict resolutions.
I would really prefer to merge this earlier.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
| * with a single root field that has many children, grandchildren etc. | ||
| */ | ||
| public void testDeeplyNestedFieldsKeepOnlyOne() throws IOException { | ||
| ByteSizeValue expected = ByteSizeValue.ofBytes(9425804); |
There was a problem hiding this comment.
This test file beautifully demonstrates the improvements from the change. Awesome, in this particularly bad case we got a reduction by 99.996%!
| EsIndex index, | ||
| String indexName, | ||
| IndexMode indexMode, | ||
| Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.
Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?
Happy to assist with moving that to an earlier stage!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
| public EsRelation(Source source, EsIndex index, List<Attribute> attributes, IndexMode indexMode, boolean frozen) { | ||
| public EsRelation( | ||
| Source source, | ||
| String indexName, |
There was a problem hiding this comment.
In many cases, there will not be a concrete index name, but an index pattern. I think it's slightly misleading to call this indexName for this reason, as it gives the impression as if this was one singular index. E.g. FROM logs* will have logs* as the index name, but actually that's not the name of any index.
| String indexName, | |
| String indexPattern, |
| EsIndex index, | ||
| String indexName, | ||
| IndexMode indexMode, | ||
| Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.
What cannot be removed is the line
var entry = indicesWithModes.entrySet().iterator().next();
which, in case of aliases, contains the actual lookup index name in the entry's key.
Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)
I'll confirm with @idegtiarenko if this is what we want to do.
| public EsRelation( | ||
| Source source, | ||
| String indexName, | ||
| IndexMode indexMode, |
There was a problem hiding this comment.
I believe the indexMode attribute has become obsolete. It's required in the UnresolvedRelation, but if we keep track of indexNameWithModes, the latter has more complete information obtained from actual field caps calls and should be preferred.
I think we only use this indexMode to check if a relation corresponds to a lookup index. I think we could just have a helper function instead isLookupIndex which checks the indexNameWithModes.
There was a problem hiding this comment.
Let me try to replace its usages with the map
There was a problem hiding this comment.
We also have 2 checks for TIME_SERIES:
that is originally configured in:
I am thinking of deferring that to a separate change
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
| EsIndex index, | ||
| String indexName, | ||
| IndexMode indexMode, | ||
| Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
@idegtiarenko and I went over this, and the map is not redundant; compared to Removing this, or at least reducing it to the absolutely required minimum, is desired but more effort (quite a bit of disentangling different usages there); and we'd like to postpone this effort. The present PR delivers on the goal of reducing the redundantly serialized data greatly in many cases. So I'd prefer getting this PR in, live with the additional transport version bump, and enjoy the benefit of leaner serialization already. |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumns.java
alex-spies
left a comment
There was a problem hiding this comment.
Thanks a ton, @idegtiarenko ! I think this is a great improvement.
I have only very minor comments; this can be merged as-is, please treat the comments as optional.
When backporting, I think it's important we run the PR with test-full-bwc to make sure that we don't accidentally break bwc with 8.17, 8.16 etc. On the off chance that we break bwc with e.g. 8.15, we might need to skip a test or so based on a capability.
To ensure this, maybe it's better to remove the auto-backport label and do the backport manually.
| } | ||
|
|
||
| public EsSourceExec(Source source, EsIndex index, List<Attribute> attributes, QueryBuilder query, IndexMode indexMode) { | ||
| public EsSourceExec( |
There was a problem hiding this comment.
super nit: we could make the constructor's parameter order here consistent with that of EsQueryExec's, in that the attributes come before the query.
| public EsStatsQueryExec( | ||
| Source source, | ||
| EsIndex index, | ||
| String indexName, |
There was a problem hiding this comment.
nit: consistency
| String indexName, | |
| String indexPattern, |
If you decide to align this, let's not forget about the nodeString as well.
| assertThat(plan, instanceOf(EsRelation.class)); | ||
| EsRelation esRelation = (EsRelation) plan; | ||
| assertThat(esRelation.output(), equalTo(NO_FIELDS)); | ||
| assertTrue(esRelation.index().mapping().isEmpty()); |
There was a problem hiding this comment.
I think the assertion on output above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.
| /** | ||
| * Test the size of serializing a plan like | ||
| * FROM index* | LIMIT 10 | KEEP one_single_field | ||
| * with an index pattern pointing to a hundred actual indices with rather long names | ||
| */ | ||
| public void testIndexPatternTargetingMultipleIndices() throws IOException { |
There was a problem hiding this comment.
++, thank you for the test case!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
💚 Backport successful
|
Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire EsIndex instance. This instance might contain huge mapping that is never used in plan. This change replaces EsIndex usage with indexPattern and indexNameWithModes to minimize the size of the serialized plan.
Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire
EsIndexinstance.This instance might contain huge
mappingthat is never used in plan. This change replacesEsIndexusage withnameandindexNameWithModesto minimize the size of the serialized plan.Closes: #112998