Use sub keyword block loader with ignore_above for text fields by dnhatn · Pull Request #140622 · elastic/elasticsearch

dnhatn · 2026-01-13T21:49:57Z

Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above.

This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb.

There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up.

Marking this as a bug fix for performance issues.

elasticsearchmachine · 2026-01-15T02:56:37Z

Hi @dnhatn, I've created a changelog YAML for you.

elasticsearchmachine · 2026-01-15T04:27:27Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es · 2026-01-15T07:54:40Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+        }

+        private BlockLoader nonDelegateBlockLoader(BlockLoaderContext blContext) {
            // 2. check if we can load from a parent field


Nit: move this above the function.

Well numbering is off.. we can just remove them from all comments.

fixed in 8fb63c3

kkrik-es

Looks good, Martijn has a better view of the work in text fields so I'll let him approve.

martijnvg

Thanks Nhat, I like this solution!

In a followup, we can look into use a block loader that uses the ignored values that are stored in binary doc values (in main only) as fallback, instead of falling back to source.

martijnvg · 2026-01-15T07:59:18Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+     * (under the limit) and doc-2 has the value "bcd..." (exceeds the limit), we can load doc-1 from the doc_values
+     * of keyword field and doc-2 from the slower stored fields.
+     */
+    abstract class ConditionalBlockLoader implements BlockLoader {


I like this new construct!

martijnvg · 2026-01-15T08:08:07Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+            return fallbackLoader;
+        }

+        private BlockLoader nonDelegateBlockLoader(BlockLoaderContext blContext) {


Ok, we can still fallback to using source here. So I think the #140687 backport pr still makes sense?

Yes, the backport of #140687 is still very important.

martijnvg · 2026-01-15T08:09:35Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+        public boolean supportsOrdinals() {
+            return false;
+        }
+
+        @Override
+        public SortedSetDocValues ordinals(LeafReaderContext context) throws IOException {
+            return null;
+        }


Maybe in a follow up pr, these methods can be removed? I don't see this being used any more?

++ will do.

…eader

dnhatn · 2026-01-15T16:20:09Z

In a followup, we can look into use a block loader that uses the ignored values that are stored in binary doc values (in main only) as fallback, instead of falling back to source.

Yes, I will do that. I also think we need to apply this change to other text-family types.

dnhatn · 2026-01-15T21:04:19Z

@martijnvg @kkrik-es Thanks for reviewing!

elasticsearchmachine · 2026-01-15T21:06:27Z

💔 Backport failed

Status	Branch	Result
❌	9.3	Commit could not be cherrypicked due to conflicts
❌	9.2	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 140622

…ic#140622) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues. (cherry picked from commit f15069a)

dnhatn · 2026-01-15T23:46:34Z

💚 All backports created successfully

Status	Branch	Result
✅	9.3
✅	9.2

Questions ?

Please refer to the Backport tool documentation

…ic#140622) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues. (cherry picked from commit f15069a)

…) (#140787) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues. (cherry picked from commit f15069a)

…ic#140622) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues. (cherry picked from commit f15069a)

…) (#140789) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues. (cherry picked from commit f15069a)

…ic#140622) Today, we do not use the block loader of the sub-keyword field when loading the text field if ignore_above is set. When ignore_above is configured, values exceeding the threshold are not stored for the keyword field, which is why we cannot load from the sub-keyword field alone. However, if all documents in a segment have values below the threshold, we can safely load values from doc_values instead of stored fields. If some documents exceed the threshold, we should load values from doc_values for those below the threshold and from stored fields for those above. This PR leverages the terms dictionary from the _ignored field to prefer loading values from doc_values of the sub-keyword field when possible. For any document where the sub-keyword field appears in the _ignored dictionary, we load from stored fields or _source; otherwise, we use doc_values. This improves performance when loading text fields, especially for logsdb. There is a bug with FLS where we blindly delegate the sub-keyword field, but it may be hidden by FLS. I will address this in a follow-up. Marking this as a bug fix for performance issues.

elasticsearchmachine added the v9.4.0 label Jan 13, 2026

dnhatn closed this Jan 13, 2026

dnhatn force-pushed the fallback-reader branch from 9e4799c to de3d1ef Compare January 13, 2026 21:58

dnhatn reopened this Jan 14, 2026

dnhatn force-pushed the fallback-reader branch 10 times, most recently from 6e706dd to a6af71c Compare January 15, 2026 00:56

dnhatn changed the title ~~WIP - loader~~ Jan 15, 2026

dnhatn changed the title ~~Use sub keyword block loader ignore_above for text fields~~ Jan 15, 2026

Prefer using block loader of sub keyword field

2881798

dnhatn force-pushed the fallback-reader branch from a6af71c to 2881798 Compare January 15, 2026 02:43

dnhatn added :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL >bug v9.3.1 v9.2.5 auto-backport Automatically create backport pull requests when merged labels Jan 15, 2026

dnhatn requested a review from martijnvg January 15, 2026 02:56

Update docs/changelog/140622.yaml

9a64ce1

dnhatn requested a review from kkrik-es January 15, 2026 04:26

dnhatn marked this pull request as ready for review January 15, 2026 04:27

dnhatn requested a review from nik9000 January 15, 2026 04:27

elasticsearchmachine added the Team:StorageEngine label Jan 15, 2026

kkrik-es reviewed Jan 15, 2026

View reviewed changes

martijnvg approved these changes Jan 15, 2026

View reviewed changes

martijnvg and others added 4 commits January 15, 2026 09:53

Merge remote-tracking branch 'es/main' into fallback-reader

da3439f

comment

8fb63c3

Merge remote-tracking branch 'elastic/main' into fallback-reader

847c445

Merge remote-tracking branch 'dnhatn/fallback-reader' into fallback-r…

31ed311

…eader

dnhatn merged commit f15069a into elastic:main Jan 15, 2026
34 of 35 checks passed

dnhatn deleted the fallback-reader branch January 15, 2026 21:05

elasticsearchmachine added the backport pending label Jan 15, 2026

dnhatn mentioned this pull request Jan 15, 2026

[9.3] Use sub keyword block loader with ignore_above for text fields (#140622) #140787

Merged

dnhatn mentioned this pull request Jan 15, 2026

[9.2] Use sub keyword block loader with ignore_above for text fields (#140622) #140789

Merged

dnhatn removed the backport pending label Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sub keyword block loader with ignore_above for text fields#140622

Use sub keyword block loader with ignore_above for text fields#140622
dnhatn merged 6 commits intoelastic:mainfrom
dnhatn:fallback-reader

dnhatn commented Jan 13, 2026 •

edited

Loading

elasticsearchmachine commented Jan 15, 2026

elasticsearchmachine commented Jan 15, 2026

kkrik-es Jan 15, 2026

kkrik-es Jan 15, 2026

dnhatn Jan 15, 2026

kkrik-es left a comment

martijnvg left a comment

martijnvg Jan 15, 2026

martijnvg Jan 15, 2026

dnhatn Jan 15, 2026

martijnvg Jan 15, 2026

dnhatn Jan 15, 2026

dnhatn commented Jan 15, 2026

dnhatn commented Jan 15, 2026

Uh oh!

elasticsearchmachine commented Jan 15, 2026

dnhatn commented Jan 15, 2026

Labels

4 participants

Conversation

dnhatn commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Jan 15, 2026

elasticsearchmachine commented Jan 15, 2026

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkrik-es left a comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Jan 15, 2026

dnhatn commented Jan 15, 2026

Uh oh!

elasticsearchmachine commented Jan 15, 2026

💔 Backport failed

dnhatn commented Jan 15, 2026

💚 All backports created successfully

Questions ?

Labels

4 participants

dnhatn commented Jan 13, 2026 •

edited

Loading