Add late chunking configuration for JinaAI embedding task settings by dan-rubinstein · Pull Request #137263 · elastic/elasticsearch

dan-rubinstein · 2025-10-28T14:00:40Z

Description

This change adds the ability to pass in the late_chunking flag as part of the task settings for a JinaAI embeddings endpoint that will control whether JinaAI will late chunk for us. As part of our existing chunking process we will batch chunks across inputs into a single request. When late chunking, we need to avoid doing this as JinaAI will assume that all of the chunks in a single request are part of a single document. This change adds logic to avoid batching chunks across inputs when we are late chunking.

Testing

Unit tests
Created an embeddings endpoint for JinaAI and tested with late_chunking set to null, true, and false.
Tested that overriding the late_chunking flag during an inference call with true and false works.
Tested that using late chunking as part of a semantic_text field works.

elasticsearchmachine · 2025-10-28T14:01:04Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2025-10-28T14:01:05Z

Hi @dan-rubinstein, I've created a changelog YAML for you.

jonathan-buttner

Looks good just a few comments.

jonathan-buttner · 2025-11-04T14:40:02Z

.../test/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunkerTests.java

+            chunkingSettings
+        ).batchRequestsWithListeners(finalListener);
+
+        int expectedNumberOfBatches = batchChunksAcrossInputs ? 1 : 3;


Could you leave a comment in the code as to why it will either be 1 or 3 here?

Sure, I'll add the following comment:
"There are 3 inputs that generate 8 chunks. If we are allowing batching of chunks across inputs, they will be placed into 1 batch. Otherwise, they will be split into 3 batches (1 per input)."

It might be even clearer to use inputs.size() instead of 3, so that it's obvious where the value is coming from.

jonathan-buttner · 2025-11-04T14:49:47Z

...sticsearch/xpack/inference/services/jinaai/embeddings/JinaAIEmbeddingsTaskSettingsTests.java

Not strictly related to your changes but how about we fix this since we're touching the class? Basically we need to something like this https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/action/PutInferenceModelRequestTests.java#L44-L58

Good catch, I'll update this while I'm making changes to this file anyways.

jonathan-buttner · 2025-11-04T14:53:39Z

...ence/src/test/java/org/elasticsearch/xpack/inference/services/jinaai/JinaAIServiceTests.java

-                                0.123,
-                                -0.123
-                            ]
+            if (Boolean.TRUE.equals(model.getTaskSettings().getLateChunking())) {


nit: Maybe move the functionality of queuing/choosing the response to a function to reduce the indentation.

I'll move this logic to a helper function.

…sTests

DonalEvans · 2025-11-05T16:58:05Z

.../test/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunkerTests.java

+    }
+
+    public void testBatchChunksAcrossInputsIsFalseAndBatchesLessThanMaxChunkLimit_ThrowsAssertionError() {
+        int batchSize = randomIntBetween(1, 511);


Would it be worth defining this upper bound on batch size based on the batchSize value currently defined in testBatchChunksAcrossInputs()? If the batch size in that method is changed, the test might start failing without it being immediately obvious why.

Sure, I'll move the batch size into a variable and reuse it across this and testBatchChunksAcrossInputs()

DonalEvans · 2025-11-05T17:21:07Z

.../test/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunkerTests.java

+        int expectedNumberOfBatches = batchChunksAcrossInputs ? 1 : 3;
+        assertThat(batches, hasSize(expectedNumberOfBatches));
+        if (batchChunksAcrossInputs) {
+            assertThat(batches.get(0).batch().inputs().get(), hasSize(8));


It would be nice if we could tie this value of 8 directly to the input text somehow, since it's not immediately obvious where it's coming from and would become incorrect if the input was changed. Similarly, the "3, 1, 4" in the other branch of this if statement is a little disconnected from the input text. Maybe we could do something like this:

int maxChunkSize = 10; var testSentence = IntStream.range(0, maxChunkSize).mapToObj(i -> "word" + i).collect(Collectors.joining(" ")) + "."; var chunkingSettings = new SentenceBoundaryChunkingSettings(maxChunkSize, 0); var batchSizes = List.of(3, 1, 4); var totalBatchSizes = batchSizes.stream().mapToInt(Integer::intValue).sum(); List<ChunkInferenceInput> inputs = batchSizes.stream() .map(i -> new ChunkInferenceInput(String.join(" ", Collections.nCopies(i, testSentence)))) .toList();

and use totalBatchSizes instead of 8.

Sure, I'll update to using this proposed process.

...e/src/main/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunker.java

…nputs exceeding max word count

jonathan-buttner

Thanks for the changes. Left a few suggestions.

...org/elasticsearch/xpack/inference/services/jinaai/request/JinaAIEmbeddingsRequestEntity.java

jonathan-buttner · 2025-11-18T15:03:10Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/chunking/ChunkerUtils.java

 public class ChunkerUtils {

+    public static int countWords(String text) {
+        BreakIterator wordIterator = BreakIterator.getWordInstance();


A question came up previously about the languages this supports. Are we mainly targeting english? Or is there anything additional you are aware of that we can do to make countWords handle more languages?

I'm mainly just checking to see if there's a configuration we can pass to BreakIterator to make it support more languages.

From the documentation it seems that it uses the default locales language. We usually pass Locale.ROOT into this function in other use cases so I'll make an update to be consistent here but I still think this will have the same behavior. If we need the user to be able to control which language it is using then we can consider that in a future change.

jonathan-buttner · 2025-11-18T15:07:00Z

...lasticsearch/xpack/inference/services/jinaai/request/JinaAIEmbeddingsRequestEntityTests.java

+        MatcherAssert.assertThat(
+            xContentResult,
+            is(
+                Strings.format(


nit: If you want to make the expected value prettier (newline etc), I've used XContentHelper.stripWhitespace to clean it up before the comparison.

DonalEvans

Just a couple of small things, nothing mandatory.

.../test/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunkerTests.java

...org/elasticsearch/xpack/inference/services/jinaai/request/JinaAIEmbeddingsRequestEntity.java

davidkyle

LGTM

… chunking word count limit

…late-chunking

jonathan-buttner

Thanks for the changes ✅

…lastic#137263) * Add late chunking configuration for JinaAI embedding task settings * Update docs/changelog/137263.yaml * Clean up tests and fix mutateInstance for JinaAIEmbeddingsTaskSettingsTests * Cleanup EmbeddingRequestChunker tests and disable late chunking for inputs exceeding max word count * Fixing test sentence generation * Adding test for generating multiple batches and clarification on late chunking word count limit

Add late chunking configuration for JinaAI embedding task settings

9daa54e

dan-rubinstein added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 28, 2025

Update docs/changelog/137263.yaml

7b411bf

jonathan-buttner reviewed Nov 4, 2025

View reviewed changes

Clean up tests and fix mutateInstance for JinaAIEmbeddingsTaskSetting…

ed92bb1

…sTests

DonalEvans reviewed Nov 5, 2025

View reviewed changes

davidkyle reviewed Nov 5, 2025

View reviewed changes

...e/src/main/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunker.java Show resolved Hide resolved

...e/src/main/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunker.java Show resolved Hide resolved

dan-rubinstein added 2 commits November 17, 2025 11:12

Cleanup EmbeddingRequestChunker tests and disable late chunking for i…

fefb26a

…nputs exceeding max word count

Fixing test sentence generation

cf2de28

jonathan-buttner approved these changes Nov 18, 2025

View reviewed changes

DonalEvans approved these changes Nov 18, 2025

View reviewed changes

.../test/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunkerTests.java Show resolved Hide resolved

...org/elasticsearch/xpack/inference/services/jinaai/request/JinaAIEmbeddingsRequestEntity.java Show resolved Hide resolved

davidkyle approved these changes Nov 19, 2025

View reviewed changes

dan-rubinstein added 2 commits November 19, 2025 09:37

Adding test for generating multiple batches and clarification on late…

b5a880b

… chunking word count limit

Merge branch 'main' of github.com:elastic/elasticsearch into jina-ai-…

69ea520

…late-chunking

jonathan-buttner approved these changes Nov 19, 2025

View reviewed changes

dan-rubinstein merged commit 2743e53 into elastic:main Nov 19, 2025
34 checks passed

kosabogi mentioned this pull request Dec 10, 2025

[ML 9.3] Add late_chunking parameter for JinaAI task settings elastic/docs-content#4264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add late chunking configuration for JinaAI embedding task settings#137263

Add late chunking configuration for JinaAI embedding task settings#137263
dan-rubinstein merged 7 commits intoelastic:mainfrom
dan-rubinstein:jina-ai-late-chunking

dan-rubinstein commented Oct 28, 2025

elasticsearchmachine commented Oct 28, 2025

elasticsearchmachine commented Oct 28, 2025

jonathan-buttner left a comment

jonathan-buttner Nov 4, 2025

dan-rubinstein Nov 4, 2025

DonalEvans Nov 5, 2025

jonathan-buttner Nov 4, 2025

dan-rubinstein Nov 4, 2025

jonathan-buttner Nov 4, 2025

dan-rubinstein Nov 4, 2025

DonalEvans Nov 5, 2025

dan-rubinstein Nov 5, 2025

DonalEvans Nov 5, 2025

dan-rubinstein Nov 5, 2025

Uh oh!

Uh oh!

jonathan-buttner left a comment

Uh oh!

jonathan-buttner Nov 18, 2025

dan-rubinstein Nov 18, 2025

jonathan-buttner Nov 18, 2025

DonalEvans left a comment

Uh oh!

Uh oh!

davidkyle left a comment

jonathan-buttner left a comment

Uh oh!

Labels

5 participants

Conversation

dan-rubinstein commented Oct 28, 2025

Description

Testing

elasticsearchmachine commented Oct 28, 2025

elasticsearchmachine commented Oct 28, 2025

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants