Skip to content

Fixed geo point block loader slowness#136147

Merged
Kubik42 merged 13 commits intoelastic:mainfrom
Kubik42:kubik-geo-point-block-loader-fix
Oct 21, 2025
Merged

Fixed geo point block loader slowness#136147
Kubik42 merged 13 commits intoelastic:mainfrom
Kubik42:kubik-geo-point-block-loader-fix

Conversation

@Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Oct 7, 2025

This addresses #135891

@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from e7afa0b to 1f168f2 Compare October 8, 2025 03:10
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

The reason for failing tests is here. Whenever there is no FieldExtractPreference (ie. FieldExtractPreference.NONE, just like in the SDH), geo_points are mapped to BytesRef. This means that when the block loader sanity checker runs, it expects the underlying element type to be BytesRef. However, since we've now started to rely on doc_values, the type is actually LONG.

Perhaps the bigger problem here is that NONE is being assigned when DOC_VALUES should be the one being assigned. This would normally not be an issue since field types generally use the same underlying types for storing values, regardless of whether they're using doc_values vs stored fields. However, geo_points are special and the way we store them changes. More specifically, if doc_values are available, we store them as Longs and when the stored flag is toggled, we store them as BytesRef; although technically its Strings, but more on that later.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

In my rabbit hole of understanding how BlockLoaders work, I found a somewhat of a bug thats in main. When a geo_point is specified with store = true and doc_values = false, we end up returning an imprecise result.

For example, the following command fails:

./gradlew ":x-pack:plugin:logsdb:yamlRestTest" --tests "org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT.test {yaml=/51_esql_synthetic_source/Simple from}" -Dtests.seed=76E814C77C112204 -Dtests.locale=xh -Dtests.timezone=Europe/Saratov -Druntime.java=25

with:

        Expected: "POINT (-74.00600004941225 40.712799984030426)"
             but: was "POINT (-74.006 40.7128)"
            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
            at org.junit.Assert.assertThat(Assert.java:964)
            at org.junit.Assert.assertThat(Assert.java:930)
            at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:100)
            at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:68)
            at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:522)
            ... 41 more

Based on what I've read in the code, this precision loss is expected. That being said, its not documented publicly.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

There is also a problem with using BlockStoredFieldsReader.BytesFromStringsBlockLoader(name()) even when the field is stored. This is despite the fact that during indexing, the StoredField is created with a String representation of the geo_point; code.

When block loading, I get the error: Unknown geometry type: 825699888, which comes from here.

My guess is that it has something to do with the geo_point being stored as a String, rather than being converted to WKT first. That said, even when I used WellKnownText.toWKT() I still received a similar error - Unknown geometry type: 1414416719.

For now, I've just followed what we do in GeoShapeWithDocValuesFieldMapper and changed the block loader to BlockStoredFieldsReader.BytesFromBytesRefsBlockLoader, which has worked.

This needs further discussion.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from 1f168f2 to b586f4c Compare October 9, 2025 01:08
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 9, 2025

I did a little more investigating and found that the reason we end up with FieldExtractPreference.NONE is because we initialize FieldExtractExec here, where both docValuesAttributes and boundsAttributes are empty sets. This leads to the following check failing, which ultimately returns defaultPreference, which is FieldExtractPreference.NONE.

This got me thinking, why are we setting the preference to NONE for attributes with doc_values? Shouldn't it be DOC_VALUES? If we think about it, NONE implies we can load the values however we want, so we're not restricted to using doc_values for that. This made me notice that inside this function, the attribute itself contains a DataType, which in turns exposes hasDocValues(). So, ultimately, wouldn't it make more sense to just do:

if (docValuesAttributes.contains(attr) || attr.dataType().hasDocValues()) {
    return MappedFieldType.FieldExtractPreference.DOC_VALUES;
}

However, doing so causes some type inequality exceptions.

Nhat pointed out this code, which suggests that doc_values shouldn't be always be used. However, that conflicts with the definition of FieldExtractPreference.NONE, which states that we can use anything.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch 4 times, most recently from adc51f9 to 5cedfba Compare October 13, 2025 04:52
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch 4 times, most recently from 9f1057d to c24d31e Compare October 14, 2025 00:26
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 16, 2025

Just adding what I've learned:

The issue is that BlockLoader.bytesRefsFromDocValues() is called differently for single doc Ordinals vs everything else. For everything else, the parameter expectedCount equals the number of documents, while for single docs, its the number of values within the first document.

Perhaps using BlockLoader.bytesRefsFromDocValues() is wrong here since I'm not actually loading BytesRefs from doc_values, but longs, that I then convert to BytesRefs. I will try using BlockLoader.bytesRefs() instead. Nothing in the javadoc screams at me that this is wrong, but at the same time, the javadoc for bytesRefsFromDocValues() isn't explicitly stating that bytesRefsFromDocValues() must be used either.

All of this can be avoided if we just stick to reading longs and not doing conversions. But to facilitate that, we need to be able to correctly set the FieldExtractPreference to be from doc_values whenever synthetic source is enabled and doc_values are available; and I suppose store = false. Afaik, we don't have this extra context when deciding on the preference.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 20, 2025

Severless checks are failing bc AWS is experiencing an LSE

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👍

@Kubik42 Kubik42 merged commit 0dd761b into elastic:main Oct 21, 2025
34 checks passed
@Kubik42 Kubik42 deleted the kubik-geo-point-block-loader-fix branch October 21, 2025 20:46
@Kubik42 Kubik42 added auto-backport Automatically create backport pull requests when merged v9.2.1 v8.19.7 v9.1.7 labels Oct 21, 2025
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)

# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/index/mapper/TestBlock.java
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 21, 2025

💚 All backports created successfully

Status Branch Result
9.2
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)

# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/index/mapper/TestBlock.java
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------


(cherry picked from commit 0dd761b)

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 21, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit that referenced this pull request Oct 22, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------


(cherry picked from commit 0dd761b)

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Oct 22, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 0dd761b)
Kubik42 added a commit that referenced this pull request Oct 23, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------


(cherry picked from commit 0dd761b)

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025
* Fixed geo point block loader slowness

* Fixed precision issue

* Addressed feedback

* Reverted TestBlock changes

* [CI] Auto commit changes from spotless

* Replaced bytesRefsFromDocValues with bytesRefs

* Fixed geo point block loader slowness

* Revert "Fixed geo point block loader slowness"

This reverts commit cac43c4.

* Fixed failing tests after rebasing against main

* Update docs/changelog/136147.yaml

* Addressed feedback, added an explanation about bytesRefsFromDocValues()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.7 v9.1.7 v9.2.1 v9.3.0

3 participants