Address mapping and compute engine runtime field issues#117792
Merged
martijnvg merged 30 commits intoelastic:mainfrom Dec 5, 2024
Merged
Address mapping and compute engine runtime field issues#117792martijnvg merged 30 commits intoelastic:mainfrom
martijnvg merged 30 commits intoelastic:mainfrom
Conversation
This change attempts to solve the following issue: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same `SourceProvider` evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by `SourceProvider`. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position. Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api. Closes elastic#117644
martijnvg
commented
Dec 2, 2024
martijnvg
commented
Dec 2, 2024
| @Override | ||
| public BlockLoader blockLoader(BlockLoaderContext blContext) { | ||
| return new LongScriptBlockDocValuesReader.LongScriptBlockLoader(leafFactory(blContext.lookup())); | ||
| boolean isSyntheticSource = SourceFieldMapper.isSynthetic(blContext.indexSettings()); |
Member
Author
There was a problem hiding this comment.
Should we have two different implementations based on source mode?
martijnvg
commented
Dec 2, 2024
|
|
||
| @Override | ||
| public final StoredFieldsSpec rowStrideStoredFieldSpec() { | ||
| return StoredFieldsSpec.NEEDS_SOURCE; |
Member
Author
There was a problem hiding this comment.
This is needed because otherwise the follow error occurs:
[2024-12-03T04:52:11,002][WARN ][o.e.x.e.a.EsqlResponseListener] [test-cluster-0] Request failed with status [INTERNAL_SERVER_ERROR]: java.lang.IllegalStateException: found row stride readers [[RowStrideReaderWork[reader=ScriptLongs, builder=org.elasticsearch.compute.data.LongBlockBuilder@19f6840e, loader=org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader$RowStrideLongScriptBlockLoader@6f332c9, offset=1], RowStrideReaderWork[reader=ScriptLongs, builder=org.elasticsearch.compute.data.LongBlockBuilder@12a1d61f, loader=org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader$RowStrideLongScriptBlockLoader@7b7a5f6c, offset=2]]] without stored fields [StoredFieldsSpec[requiresSource=false, requiresMetadata=false, requiredStoredFields=[]]]
at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.loadFromSingleLeaf(ValuesSourceReaderOperator.java:254)
at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.process(ValuesSourceReaderOperator.java:143)
at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:76)
at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
However it isn't really required. Since the SearchLook / SourceLoader will load stored fields and not compute engine.
…te engine and scripting infrastructure to load _source
Member
|
Another issue is the thread switching, as the SyntheticSourceProvider is designed to work with only one thread. I think we need to address this before merging the change. It appears we are loading the source once for each script field, but I think it's okay to address the performance issue later. |
…w SourceLoader for each runtime field used a the query.
…ocid data partitioning and accesses _source (either synthetic or stored source)
Collaborator
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
dnhatn
reviewed
Dec 5, 2024
server/src/main/java/org/elasticsearch/search/lookup/ReinitializingSourceProvider.java
Outdated
Show resolved
Hide resolved
dnhatn
approved these changes
Dec 5, 2024
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Dec 5, 2024
This change addresses the following issues: Fields mapped as runtime fields not getting stored if source mode is synthetic. Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1) Address concurrency issue when runtime fields get pushed down to Lucene. (2) 1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position. Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api. 2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access. Closes elastic#117644
This was referenced Dec 5, 2024
Collaborator
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Dec 5, 2024
This change addresses the following issues: Fields mapped as runtime fields not getting stored if source mode is synthetic. Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1) Address concurrency issue when runtime fields get pushed down to Lucene. (2) 1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position. Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api. 2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access. Closes elastic#117644
elasticsearchmachine
pushed a commit
that referenced
this pull request
Dec 5, 2024
…) (#118048) * Address mapping and compute engine runtime field issues (#117792) This change addresses the following issues: Fields mapped as runtime fields not getting stored if source mode is synthetic. Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1) Address concurrency issue when runtime fields get pushed down to Lucene. (2) 1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position. Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api. 2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access. Closes #117644 * fixed compile error after backporting
martijnvg
added a commit
that referenced
this pull request
Dec 5, 2024
… (#118049) * Address mapping and compute engine runtime field issues (#117792) This change addresses the following issues: Fields mapped as runtime fields not getting stored if source mode is synthetic. Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1) Address concurrency issue when runtime fields get pushed down to Lucene. (2) 1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position. Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api. 2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access. Closes #117644
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Dec 10, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. When multiple threads access `ReinitializingSourceProvider` the simple thread accounting could still result in returned `SourceProvider` being used by multiple threads concurrently. The ReinitializingSourceProvider was introduced via elastic#117792 Closes elastic#118238
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Dec 11, 2024
This reverts the workaround that was introduced in elastic#117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. This is now covered by the `ReinitializingSourceProvider` workaround that covers that and the concurrency problem. With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider.
martijnvg
added a commit
that referenced
this pull request
Jan 14, 2025
This reverts the workaround that was introduced in #117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. This is now covered by the ReinitializingSourceProvider workaround that covers that and the concurrency problem. With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider. Additional another in `ReinitializingSourceProvider` was fixed, the issue was the lastSeenDoc field was reused overwritten by different threads, the latest commit moves the lastSeenDoc field to PerThreadSourceProvider so that each thread gets its own place to store the last seen docid.
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Jan 14, 2025
This reverts the workaround that was introduced in elastic#117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. This is now covered by the ReinitializingSourceProvider workaround that covers that and the concurrency problem. With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider. Additional another in `ReinitializingSourceProvider` was fixed, the issue was the lastSeenDoc field was reused overwritten by different threads, the latest commit moves the lastSeenDoc field to PerThreadSourceProvider so that each thread gets its own place to store the last seen docid.
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Jan 14, 2025
This reverts the workaround that was introduced in elastic#117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. This is now covered by the ReinitializingSourceProvider workaround that covers that and the concurrency problem. With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider. Additional another in `ReinitializingSourceProvider` was fixed, the issue was the lastSeenDoc field was reused overwritten by different threads, the latest commit moves the lastSeenDoc field to PerThreadSourceProvider so that each thread gets its own place to store the last seen docid.
martijnvg
added a commit
that referenced
this pull request
Jan 15, 2025
Backporting #118480 to 8.x branch. This reverts the workaround that was introduced in #117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. This is now covered by the ReinitializingSourceProvider workaround that covers that and the concurrency problem. With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider. Additional another in `ReinitializingSourceProvider` was fixed, the issue was the lastSeenDoc field was reused overwritten by different threads, the latest commit moves the lastSeenDoc field to PerThreadSourceProvider so that each thread gets its own place to store the last seen docid.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses the following issues:
java.io.EOFExceptionwhen an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same
SourceProviderevaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used bySourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.
2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same
SourceProviderinstance then gets access by multiple threads concurrently.SourceProvidersimplementations are not designed to handle concurrent access.Closes #117644