ESQL: Pragma to load from stored fields by nik9000 · Pull Request #122891 · elastic/elasticsearch

nik9000 · 2025-02-18T19:38:57Z

This creates a pragma you can use to request that fields load from a stored field rather than doc values. It implements that pragma for keyword and number fields.

We expect that, for some disk configuration and some number of fields, that it's faster to load those fields from _source or stored fields than it is to use doc values. Our default is doc values and on my laptop it's always faster to use doc values. But we don't ship my laptop to every cluster.

This will let us experiment and debug slow queries by trying to load fields a different way.

You access this pragma with:

curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
    "query": "FROM foo",
    "pragma": {
        "field_extract_preference": "PREFER_STORED"
    }
}'

On a release build you'll need to add "accept_pragma_risks": true.

This creates a `pragma` you can use to request that fields load from a stored field rather than doc values. It implements that pragma for `keyword` and number fields. We expect that, for some disk configuration and some number of fields, that it's faster to load those fields from _source or stored fields than it is to use doc values. Our default is doc values and on my laptop it's *always* faster to use doc values. But we don't ship my laptop to every cluster. This will let us experiment and debug slow queries by trying to load fields a different way. You access this pragma with: ``` curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{ "query": "FROM foo", "pragma": { "field_extract_preference": "PREFER_STORED" } }' ``` On a release build you'll need to add `"accept_pragma_risks": true`.

elasticsearchmachine · 2025-02-18T19:39:32Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

lkts

LGTM

lkts · 2025-02-18T21:34:15Z

test/framework/src/main/java/org/elasticsearch/index/mapper/BlockLoaderTestCase.java

+        List<Object[]> args = new ArrayList<>();
+        for (boolean syntheticSource : new boolean[] { false, true }) {
+            for (MappedFieldType.FieldExtractPreference preference : PREFERENCES) {
+                args.add(new Object[] { new Params(syntheticSource, preference) });


It seems to work pretty well really!

nik9000 · 2025-02-18T21:43:24Z

Thanks @lkts!

I'd like to have a review from @craigtaverner before merging.

elasticsearchmachine · 2025-02-24T14:51:41Z

Hi @nik9000, I've created a changelog YAML for you.

…sql_pragma_load_source

nik9000

Found a fun thing - if we pragma a field over to loading from _source it'll load nested fields when it really shouldn't. Subfields of nested fields shouldn't really work in ESQL.

nik9000 · 2025-03-06T21:14:36Z

Found a fun thing - if we pragma a field over to loading from _source it'll load nested fields when it really shouldn't. Subfields of nested fields shouldn't really work in ESQL.

I don't think that needs to block merging this as pragmas can change behavior - but it is a bug we should fix.

nik9000 · 2025-03-11T12:27:28Z

If we're going to engage this stuff by default we need to figure out the load-from-_source problem with nested - right now if you load from _source you get the nested sub-fields. We don't expect that to happen.

craigtaverner

Nice feature. But I did have some comments I thought worth considering.

craigtaverner · 2025-03-11T13:10:43Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

                return BlockSourceReader.lookupFromNorms(name());
            }
-            if (isIndexed() || isStored()) {
+            if (hasDocValues() == false && (isIndexed() || isStored())) {


I understand that this check is because previously we would never reach this code if hasDocValues was true, and now we might. I'm just wondering why we care to check this. If the request was to prefer_stored, then surely the existence of doc-values should not affect the decision taken on this line?

This is a preflight check so we don't try to load from _source if the field isn't present. We only go if the field_names field has the field. Except we never make the field_names field if doc values are enabled.

I'll leave a comment.

craigtaverner · 2025-03-11T13:12:33Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

+         * loading many fields. The {@link MappedFieldType} can chose a different
+         * method to load the field if it needs to.
+         */
+        PREFER_STORED;


I feel like the prefix PREFER_ duplicates the enum name. None of the other enum values have PREFER_ as a prefix, so perhaps this should not. The fact that this is a preference is already in the FieldExtractPreference.

Sure. I feel like the other "preference"s were more like requirements. But I'll rename.

The others were intended to be preferences. The previous default behaviour of loading doc-values if they exist, otherwise source, was swapped around for spatial types, and the DOC_VALUES preference was to swap that back, for performance reasons, but we could still load from source (just slower). Also the bounds extraction preference is just a performance optimization, because if we don't do that we still get the right answer, just much slower.

craigtaverner · 2025-03-11T13:27:36Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/FieldExtractExec.java

+        Source source,
+        PhysicalPlan child,
+        List<Attribute> attributesToExtract,
+        MappedFieldType.FieldExtractPreference defaultPreference


This constructor should not have the field-extract preference, since that is local-physical planning only. It should set the default NONE on the call to this() below.

Just tried to apply this and we do need it, at least we need it how things are shaped now. InsertFieldExtraction wants to make a FieldExtractExec on the data nodes with this already set.

craigtaverner · 2025-03-11T13:28:42Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/FieldExtractExec.java

            in.readNamedWriteable(PhysicalPlan.class),
-            in.readNamedWriteableCollectionAsList(Attribute.class)
+            in.readNamedWriteableCollectionAsList(Attribute.class),
+            MappedFieldType.FieldExtractPreference.NONE


Delete this line, and have the constructor set the default. See the comments on the lines a couple of lines below for the pattern we used for which parameters can be set during local physical node planning, versus those that happen in the coordinator node.

craigtaverner · 2025-03-11T13:30:10Z

...test/java/org/elasticsearch/xpack/esql/plan/physical/FieldExtractExecSerializationTests.java

        PhysicalPlan child = randomChild(depth);
        List<Attribute> attributesToExtract = randomFieldAttributes(1, 4, false);
-        return new FieldExtractExec(source, child, attributesToExtract);
+        return new FieldExtractExec(source, child, attributesToExtract, MappedFieldType.FieldExtractPreference.NONE);


Could revert this change, if we use the pattern described in the comments on FieldExtractPreference

This creates a `pragma` you can use to request that fields load from a stored field rather than doc values. It implements that pragma for `keyword` and number fields. We expect that, for some disk configuration and some number of fields, that it's faster to load those fields from _source or stored fields than it is to use doc values. Our default is doc values and on my laptop it's *always* faster to use doc values. But we don't ship my laptop to every cluster. This will let us experiment and debug slow queries by trying to load fields a different way. You access this pragma with: ``` curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{ "query": "FROM foo", "pragma": { "field_extract_preference": "STORED" } }' ``` On a release build you'll need to add `"accept_pragma_risks": true`.

nik9000 added 2 commits February 18, 2025 11:35

Merge branch 'main' into esql_pragma_load_source

87c3446

nik9000 requested review from craigtaverner and lkts February 18, 2025 19:38

nik9000 added :Analytics/ES|QL AKA ESQL v9.1.0 labels Feb 18, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 18, 2025

lkts approved these changes Feb 18, 2025

View reviewed changes

nik9000 added 2 commits February 18, 2025 16:40

Merge branch 'main' into esql_pragma_load_source

bebaa4f

uFormat

2a86568

nik9000 added 3 commits February 19, 2025 09:37

Merge branch 'main' into esql_pragma_load_source

d84457e

Fix one

77408d1

Merge branch 'main' into esql_pragma_load_source

7f99566

nik9000 added the >enhancement label Feb 24, 2025

Update docs/changelog/122891.yaml

33a1cc1

nik9000 added 8 commits February 24, 2025 11:48

Merge branch 'main' into esql_pragma_load_source

eb03188

Merge remote-tracking branch 'nik9000/esql_pragma_load_source' into e…

263f5ca

…sql_pragma_load_source

Merge branch 'main' into esql_pragma_load_source

b35c5e8

Merge branch 'main' into esql_pragma_load_source

0438a1e

Merge branch 'main' into esql_pragma_load_source

949177a

Merge branch 'main' into esql_pragma_load_source

26e38f7

Merge remote-tracking branch 'nik9000/esql_pragma_load_source' into e…

2e2ad48

…sql_pragma_load_source

Merge branch 'main' into esql_pragma_load_source

3de7dc4

nik9000 commented Mar 6, 2025

View reviewed changes

nik9000 added 2 commits March 7, 2025 10:14

Fix test

0e74c74

Merge branch 'main' into esql_pragma_load_source

997fc97

Compile

c4bcbe8

nik9000 added 2 commits March 11, 2025 08:39

Merge branch 'main' into esql_pragma_load_source

34d0c42

Fixup test

d5c948e

craigtaverner approved these changes Mar 11, 2025

View reviewed changes

nik9000 added 2 commits March 11, 2025 15:59

Merge branch 'main' into esql_pragma_load_source

f8e03c4

Update

f352d41

nik9000 merged commit 50aaa1c into elastic:main Mar 12, 2025
17 checks passed

alex-spies mentioned this pull request Mar 14, 2025

ESQL: FieldExtractorIT failing release tests #124903

Closed

lkts mentioned this pull request Mar 18, 2025

Use FallbackSyntheticSourceBlockLoader for shape and geo_shape #124927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Pragma to load from stored fields#122891

ESQL: Pragma to load from stored fields#122891
nik9000 merged 23 commits intoelastic:mainfrom
nik9000:esql_pragma_load_source

nik9000 commented Feb 18, 2025

elasticsearchmachine commented Feb 18, 2025

lkts left a comment

lkts Feb 18, 2025

nik9000 Feb 18, 2025

nik9000 commented Feb 18, 2025

elasticsearchmachine commented Feb 24, 2025

nik9000 left a comment

nik9000 commented Mar 6, 2025

nik9000 commented Mar 11, 2025

craigtaverner left a comment

craigtaverner Mar 11, 2025

nik9000 Mar 11, 2025

craigtaverner Mar 11, 2025

nik9000 Mar 11, 2025

craigtaverner Mar 11, 2025

craigtaverner Mar 11, 2025

nik9000 Mar 11, 2025

nik9000 Mar 11, 2025

craigtaverner Mar 11, 2025

craigtaverner Mar 11, 2025

Uh oh!

Labels

4 participants

Conversation

nik9000 commented Feb 18, 2025

elasticsearchmachine commented Feb 18, 2025

lkts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Feb 18, 2025

elasticsearchmachine commented Feb 24, 2025

nik9000 left a comment

Choose a reason for hiding this comment

nik9000 commented Mar 6, 2025

nik9000 commented Mar 11, 2025

craigtaverner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants