Skip to content

Propagate converted fields through projections#137923

Merged
MattAlp merged 22 commits intoelastic:mainfrom
MattAlp:propagate-converted-fields-thru-projections
Jan 16, 2026
Merged

Propagate converted fields through projections#137923
MattAlp merged 22 commits intoelastic:mainfrom
MattAlp:propagate-converted-fields-thru-projections

Conversation

@MattAlp
Copy link
Contributor

@MattAlp MattAlp commented Nov 11, 2025

Addresses #137187.

Reviewers should check

  • Transformation rule correctness
  • Comprehensiveness of the analyzer test
  • If we should add something to the logical plan optimization tests
@MattAlp MattAlp added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL Team:ES|QL labels Nov 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @MattAlp, I've created a changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MattAlp ! Can we add some LogicalPlanOptimizerTests and CsvTests to validate this change? The original issue errors out in logical plan's verifier, it will be helpful to validate the plan after logical planner is as expected and we get expected results. I tried this query below, it failed on main, and run to completion on the branch, however it seems like we get an extra field returned.

+ curl -u elastic:password -v -H 'Content-Type: application/json' '127.0.0.1:9200/_query?format=txt' -d '{
    "query": "from sample_data_* | keep @timestamp, message | mv_expand @timestamp | eval message = message::keyword"
}
'
       @timestamp       |$$message$converted_to$keyword|       message       
------------------------+------------------------------+---------------------
2023-10-23T12:15:03.360Z|Connected to 10.1.0.3         |Connected to 10.1.0.3
2023-10-23T12:27:28.948Z|Connected to 10.1.0.2         |Connected to 10.1.0.2
2023-10-23T13:33:34.937Z|Disconnected                  |Disconnected         
2023-10-23T13:51:54.732Z|Connection error              |Connection error     
2023-10-23T13:52:55.015Z|Connection error              |Connection error     
2023-10-23T13:53:55.832Z|Connection error              |Connection error     
2023-10-23T13:55:01.543Z|Connected to 10.1.0.1         |Connected to 10.1.0.1
2023-10-23T12:15:03.360Z|Connected to 10.1.0.3         |Connected to 10.1.0.3
2023-10-23T12:27:28.948Z|Connected to 10.1.0.2         |Connected to 10.1.0.2
2023-10-23T13:33:34.937Z|Disconnected                  |Disconnected         
2023-10-23T13:51:54.732Z|Connection error              |Connection error     
2023-10-23T13:52:55.015Z|Connection error              |Connection error     
2023-10-23T13:53:55.832Z|Connection error              |Connection error     
2023-10-23T13:55:01.543Z|Connected to 10.1.0.1         |Connected to 10.1.0.1
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we get the right solution, we should add integration tests too: adding to union-types.csv-spec might be the easiest (there are also yaml ones in 160_union_types.yml).

}
return esr;
});
res = res.transformUp(Project.class, p -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that just adding the missing synthetic attribute won't be enough.
What happens is that the Project still contains FieldAttribute(s) containing an InvalidMappedField that the UnionTypesCleanup rule converts to an UnsupportedAttribute. Because of this, the resulting plan is not resolved(). Which means that the rule will not add a top Project to drop the synthetic attributes. So we end up outputting more attributes than we need to.

I think we will need to:

  • [either] not only add the synthetic attributes to the Project (as done so far), but drop the original attribute corresponding to to the synthetic one. This should be safe, since no other command is allowed to work with this attribute before conversion; but I'm not sure if the verification of the plan will remain safe then and we'll put out the correct error message to the user, in case the query does touch the attribute before conversion (the existing tests should reveal that); an alternative would be to:
  • [or] add a Project before the converting EVAL that drops the original attribute corresponding to to the synthetic one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this, the resulting plan is not resolved()

Wow, that'd be horrific! But actually, it's not the case, but it's rightfully confusing. The only plan where an UnsupportedAttribute can occur in the expressions should be EsqlProject; and that one specifically considers UnsupportedAttributes okay:

if (projection.resolved() == false && projection instanceof UnsupportedAttribute == false) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only add the synthetic attributes to the Project (as done so far), but drop the original attribute corresponding to to the synthetic one

Hm, we originally left the original attributes with the mapping conflicts untouched, so we don't mess with the validation in the verifier; removing the attribute will give an inconsistent plan to the verifier, which is why adding the projection on top should be safer and easier.

I think just adding the synth attribute to the projection should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a note here regarding some test cases in which one does want to retain unsupported values: #135547

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea introduced the super nice ResolvingProjects in a recent PR. These represent projects that aren't finalized yet, and keep track of patterns; e.g. KEEP foo* may right now resolve to Project[foo1, foo2], but adding a foo3 upstream would not have an effect because foo3 would be thrown away. A ResolvingProject[foo*], however, gets resolved at the end of the analyzer run and becomes a Project[foo1, foo2, foo3] if foo3 was added upstream.

These ResolvingProjects could also solve this problem here if we ever end up being unhappy with the current solution. A Project[mapping_conflict_field] could become a ResolvingProject[mapping_conflict_field, $$mapping_conflict_field$converted_to$*].

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution looks correct to me, thanks @MattAlp !

You got some suggestions on added tests - I agree these make sense adding before merging!

@bpintea 's comment here is important. I'm quite convinced that we're fine, but that's a great rabbit hole to dig into to understand why we're, indeed, fine. It'd be cool if you could have a look at this, and add some comments in places that would make this easier to understand for our future selves. Also, there are existing csv tests for union types that ensure that the synthetic attributes don't slip through - maybe have a look at them if you want to run the debugger and confirm what's happening.

What I can say so far is:

  • plans must be resolved after passing the verifier, otherwise we cannot e.g. call .datatype() on expressions and that blows up the optimizer.
  • UnsupportedAttribute is actually Unresolvable. Generally, that means that plans and expressions containing an UnsupportedAttribute will be consider unresolved.
    This makes it so that using it in expressions will generally trigger an error message from the verifier; that's what we want.
  • Using UnsupportedAttributes is okay in EsqlProject, and the verifier has an exception to allow it. That's because it's fine to use a field with mapping conflicts in KEEP and DROP (but not elsewhere, and not in RENAME).
IndexResolution resolution = IndexResolution.valid(index);
Analyzer analyzer = analyzer(resolution);

String query = "FROM union_index_combined | KEEP id, foo | MV_EXPAND foo | EVAL id = id::keyword";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the analyzer doesn't push down projections, yet, so the MV_EXPAND shouldn't be required for making a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it in to be faithful to the original issue & for the future, but good shout!

MattAlp and others added 7 commits December 22, 2025 12:02
…analysis/AnalyzerTests.java

Co-authored-by: Bogdan Pintea <sig11@mailbox.org>
…analysis/AnalyzerTests.java

Co-authored-by: Bogdan Pintea <sig11@mailbox.org>
Keep synthetics if they aren't aliased, keep unsupported attributed if they're explicitly retained, drop either if shadowing (with optional casting) occurs
@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@MattAlp
Copy link
Contributor Author

MattAlp commented Dec 23, 2025

I haven't added many comments inline as part of this PR - I'd actually like to add them to the internal & external docs alongside some other notes I've been taking while working on planner PRs incl. this one.

@MattAlp MattAlp marked this pull request as ready for review December 23, 2025 18:25
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@MattAlp
Copy link
Contributor Author

MattAlp commented Dec 23, 2025

INLINE STATS behavior is covered - need to explore STATS further

Copy link
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thank you for taking the time to explain the fix to me! I left a few comments about additional tests to make sure we have handled all cases.

@MattAlp MattAlp requested a review from alex-spies January 12, 2026 17:20
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @MattAlp !

}
return esr;
});
res = res.transformUp(Project.class, p -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea introduced the super nice ResolvingProjects in a recent PR. These represent projects that aren't finalized yet, and keep track of patterns; e.g. KEEP foo* may right now resolve to Project[foo1, foo2], but adding a foo3 upstream would not have an effect because foo3 would be thrown away. A ResolvingProject[foo*], however, gets resolved at the end of the analyzer run and becomes a Project[foo1, foo2, foo3] if foo3 was added upstream.

These ResolvingProjects could also solve this problem here if we ever end up being unhappy with the current solution. A Project[mapping_conflict_field] could become a ResolvingProject[mapping_conflict_field, $$mapping_conflict_field$converted_to$*].

}
return esr;
});
res = res.transformUp(Project.class, p -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could check if this transformUp call is even necessary. If no fields were added to the EsRelation, then we don't have any Projects that would need fixing.

@MattAlp MattAlp merged commit 4f57553 into elastic:main Jan 16, 2026
35 checks passed
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:ES|QL v9.4.0

6 participants