Skip to content

Add support for merges in ES93BloomFilterStoredFieldsFormat#137622

Merged
fcofdez merged 10 commits intoelastic:mainfrom
fcofdez:bloom-filter-stored-fields-merges
Nov 13, 2025
Merged

Add support for merges in ES93BloomFilterStoredFieldsFormat#137622
fcofdez merged 10 commits intoelastic:mainfrom
fcofdez:bloom-filter-stored-fields-merges

Conversation

@fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Nov 5, 2025

No description provided.

@fcofdez fcofdez requested a review from tlrx November 5, 2025 13:29
@fcofdez fcofdez added WIP :StorageEngine/TSDB You know, for Metrics labels Nov 5, 2025
@fcofdez fcofdez requested a review from burqen November 5, 2025 15:54
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for the late review @fcofdez . I only left minor comments, feel free to address them or not.


private void addToBloomFilter(FieldInfo info, BytesRef value) {
private int getBloomFilterSizeInBits() {
var bloomFilterSizeInBits = defaultBloomFilterSizeInBitsSupplier.getAsInt();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var bloomFilterSizeInBits = defaultBloomFilterSizeInBitsSupplier.getAsInt();
int bloomFilterSizeInBits = defaultBloomFilterSizeInBitsSupplier.getAsInt();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165

);
}

private void addToBloomFilter(FieldInfo info, BytesRef value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just pass the value? We're checking beforehand the field info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165

this.bitSetSizeInBytes = bitsetSizeInBits / Byte.SIZE;
this.buffer = bigArrays.newByteArray(bitSetSizeInBytes, false);
this.hashes = new int[numHashFunctions];
this.bloomFilterDataOut = directory.createOutput(bloomFilterFileName(segmentInfo, segmentSuffix), context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe ensure bloomFilterDataOut is closed if writing the header fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done in 07ad165

buffer.set(pos, val);
}
maybeInitializeBloomFilterWriter(info, getBloomFilterSizeInBits());
bloomFilterWriter.addToBloomFilter(info, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rename to:

Suggested change
bloomFilterWriter.addToBloomFilter(info, value);
bloomFilterWriter.add(value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165

@Override
public int merge(MergeState mergeState) throws IOException {
// Skip merging the bloom filter for now
if (canOrBloomFilters(mergeState)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (canOrBloomFilters(mergeState)) {
if (useOptimizedMerge(mergeState)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165

}
}

private void mergeBloomFiltersWithOr(MergeState mergeState) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void mergeBloomFiltersWithOr(MergeState mergeState) throws IOException {
private void mergeOptimized(MergeState mergeState) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165


for (int readerIdx = 0; readerIdx < mergeState.storedFieldsReaders.length; readerIdx++) {
StoredFieldsReader storedFieldsReader = mergeState.storedFieldsReaders[readerIdx];
if (storedFieldsReader instanceof Reader reader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 07ad165

bloomFilterFieldReader.checkIntegrity();
IndexInput bloomFilterData = bloomFilterFieldReader.bloomFilterData;

bloomFilterData.prefetch(0, bitSetSizeInBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

final FieldsProducer f = mergeState.fieldsProducers[readerIndex];

final int maxDoc = mergeState.maxDocs[readerIndex];
if (f != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the FieldsProducer be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes, if the segment doesn't have terms stored.

final int maxDoc = mergeState.maxDocs[readerIndex];
if (f != null) {
// This can be somewhat expensive and we'll check the integrity again while merging the terms, should we skip it?
f.checkIntegrity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to keep it: it should always be used with the synthetic _id postings formats which does nothing in checkIntegrity

@fcofdez fcofdez added >enhancement and removed WIP labels Nov 10, 2025
@fcofdez fcofdez marked this pull request as ready for review November 10, 2025 10:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait for @martijnvg to approve.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 11, 2025

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 11, 2025

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 12, 2025

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 12, 2025

@elasticmachine update branch

@fcofdez fcofdez merged commit 77c5f7a into elastic:main Nov 13, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment