Skip to content

Add late chunking configuration for JinaAI embedding task settings#137263

Merged
dan-rubinstein merged 7 commits intoelastic:mainfrom
dan-rubinstein:jina-ai-late-chunking
Nov 19, 2025
Merged

Add late chunking configuration for JinaAI embedding task settings#137263
dan-rubinstein merged 7 commits intoelastic:mainfrom
dan-rubinstein:jina-ai-late-chunking

Conversation

@dan-rubinstein
Copy link
Member

Description

This change adds the ability to pass in the late_chunking flag as part of the task settings for a JinaAI embeddings endpoint that will control whether JinaAI will late chunk for us. As part of our existing chunking process we will batch chunks across inputs into a single request. When late chunking, we need to avoid doing this as JinaAI will assume that all of the chunks in a single request are part of a single document. This change adds logic to avoid batching chunks across inputs when we are late chunking.

Testing

  • Unit tests
  • Created an embeddings endpoint for JinaAI and tested with late_chunking set to null, true, and false.
  • Tested that overriding the late_chunking flag during an inference call with true and false works.
  • Tested that using late chunking as part of a semantic_text field works.
@dan-rubinstein dan-rubinstein added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @dan-rubinstein, I've created a changelog YAML for you.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just a few comments.

chunkingSettings
).batchRequestsWithListeners(finalListener);

int expectedNumberOfBatches = batchChunksAcrossInputs ? 1 : 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment in the code as to why it will either be 1 or 3 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add the following comment:
"There are 3 inputs that generate 8 chunks. If we are allowing batching of chunks across inputs, they will be placed into 1 batch. Otherwise, they will be split into 3 batches (1 per input)."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be even clearer to use inputs.size() instead of 3, so that it's obvious where the value is coming from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to your changes but how about we fix this since we're touching the class? Basically we need to something like this https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/PutInferenceModelRequestTests.java#L44-L58

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll update this while I'm making changes to this file anyways.

0.123,
-0.123
]
if (Boolean.TRUE.equals(model.getTaskSettings().getLateChunking())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe move the functionality of queuing/choosing the response to a function to reduce the indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this logic to a helper function.

}

public void testBatchChunksAcrossInputsIsFalseAndBatchesLessThanMaxChunkLimit_ThrowsAssertionError() {
int batchSize = randomIntBetween(1, 511);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth defining this upper bound on batch size based on the batchSize value currently defined in testBatchChunksAcrossInputs()? If the batch size in that method is changed, the test might start failing without it being immediately obvious why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move the batch size into a variable and reuse it across this and testBatchChunksAcrossInputs()

int expectedNumberOfBatches = batchChunksAcrossInputs ? 1 : 3;
assertThat(batches, hasSize(expectedNumberOfBatches));
if (batchChunksAcrossInputs) {
assertThat(batches.get(0).batch().inputs().get(), hasSize(8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could tie this value of 8 directly to the input text somehow, since it's not immediately obvious where it's coming from and would become incorrect if the input was changed. Similarly, the "3, 1, 4" in the other branch of this if statement is a little disconnected from the input text. Maybe we could do something like this:

        int maxChunkSize = 10;
        var testSentence = IntStream.range(0, maxChunkSize).mapToObj(i -> "word" + i).collect(Collectors.joining(" ")) + ".";
        var chunkingSettings = new SentenceBoundaryChunkingSettings(maxChunkSize, 0);
        var batchSizes = List.of(3, 1, 4);
        var totalBatchSizes = batchSizes.stream().mapToInt(Integer::intValue).sum();
        List<ChunkInferenceInput> inputs = batchSizes.stream()
            .map(i -> new ChunkInferenceInput(String.join(" ", Collections.nCopies(i, testSentence))))
            .toList();

and use totalBatchSizes instead of 8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update to using this proposed process.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Left a few suggestions.

public class ChunkerUtils {

public static int countWords(String text) {
BreakIterator wordIterator = BreakIterator.getWordInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question came up previously about the languages this supports. Are we mainly targeting english? Or is there anything additional you are aware of that we can do to make countWords handle more languages?

I'm mainly just checking to see if there's a configuration we can pass to BreakIterator to make it support more languages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation it seems that it uses the default locales language. We usually pass Locale.ROOT into this function in other use cases so I'll make an update to be consistent here but I still think this will have the same behavior. If we need the user to be able to control which language it is using then we can consider that in a future change.

MatcherAssert.assertThat(
xContentResult,
is(
Strings.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If you want to make the expected value prettier (newline etc), I've used XContentHelper.stripWhitespace to clean it up before the comparison.

Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small things, nothing mandatory.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes ✅

@dan-rubinstein dan-rubinstein merged commit 2743e53 into elastic:main Nov 19, 2025
34 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
…lastic#137263)

* Add late chunking configuration for JinaAI embedding task settings

* Update docs/changelog/137263.yaml

* Clean up tests and fix mutateInstance for JinaAIEmbeddingsTaskSettingsTests

* Cleanup EmbeddingRequestChunker tests and disable late chunking for inputs exceeding max word count

* Fixing test sentence generation

* Adding test for generating multiple batches and clarification on late chunking word count limit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0

5 participants