[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK by afoucret · Pull Request #139074 · elastic/elasticsearch

afoucret · 2025-12-04T16:13:08Z

Added implicits configurable Limit for `COMPLETION` and `RERANK`

Execution limits
- The maximum number of rows processed by COMPLETION and RERANK must be bounded by default.
- When the input dataset exceeds this limit, it should be truncated before the command executes.
- Administrators must be able to override or disable these limits through cluster settings:

Setting	Description	Default
esql.command.completion.enabled	Enable or disable the COMPLETION command	true
esql.command.completion.limit	Maximum number of rows to be processed by COMPLETION	100
esql.command.rerank.enabled	Enable or disable the RERANK command	true
esql.command.rerank.limit	Maximum number of rows to be processed by RERANK	1000

Updated documentation

Documentation.
- < 9.3
  - COMPLETION and RERANK documentation must explicitly warn about risks.
  - Documentation must strongly recommend adding a LIMIT before COMPLETION and RERANK.
- >= 9.3.0 Added a note about the implicit limit in RERANK and COMPLETION

Others:

Added to integration tests for COMPLETION and RERANK so we can have tests scenarios using settings

Closes: #136861

elasticsearchmachine · 2025-12-08T08:30:18Z

Hi @afoucret, I've created a changelog YAML for you.

elasticsearchmachine · 2025-12-08T10:30:56Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-12-08T10:30:56Z

Hi @afoucret, I've created a changelog YAML for you.

github-actions · 2025-12-08T16:22:20Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

…ge-limit-v3

ioanatia · 2025-12-09T09:19:11Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/inference/Rerank.java

            Source.readFrom((PlanStreamInput) in),
            in.readNamedWriteable(LogicalPlan.class),
            in.readNamedWriteable(Expression.class),
+            in.getTransportVersion().supports(ESQL_INFERENCE_ROW_LIMIT_TRANSPORT_VERSION)


we don't need to introduce a new transport version, in fact we don't need this method at all.
now that RERANK and COMPLETION have an implicit limit - they will always be executed on the coordinator, meaning we never need to send them to the data nodes.
so we can further simplify this, remove the NamedWritable and just throw an exception if we need to serialize them (which would be a bug and a code path that should never be reached):

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Row.java

Lines 36 to 44 in 8f72b23

@Override

public void writeTo(StreamOutput out) {

throw new UnsupportedOperationException("not serialized");

}

@Override

public String getWriteableName() {

throw new UnsupportedOperationException("not serialized");

}

ChangePoint, Fuse, Fork etc are also not serialized.

Completely removed the serialization logic for Rerank and Completion logical and physical plans.

ioanatia · 2025-12-09T09:59:32Z

...ugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/CompletionIT.java

+        try (var resp = run(query)) {
+            List<List<Object>> values = getValuesList(resp);
+            // Should be limited by the default row limit (100)
+            assertThat(values.size(), lessThanOrEqualTo(100));


the index we create always has 6 documents - so we are not really testing the limit enforcement.
let's change the createAndPopulateTestIndex so that we create an index with more than 100 documents when we test COMPLETION and more than 1000 when we test RERANK.
and here we should check that we get exactly 100 documents back.

Test have been updated.

ioanatia · 2025-12-09T10:00:37Z

...ugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/CompletionIT.java

+
+            try (var resp = run(query)) {
+                List<List<Object>> values = getValuesList(resp);
+                assertThat(values.size(), lessThanOrEqualTo(customLimit));


let's check we get exactly customLimit docs back (take a look at the other comment that suggests changing createAndPopulateTestIndex.

Tests have been updated.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceService.java

leemthompo · 2025-12-09T10:13:35Z

Docs preview LGTM, nice!

…k (do not escape the coordinator node anymore).

…icient test data

ioanatia · 2025-12-09T12:11:48Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlParser.java

+    public EsqlStatement parse(
+        String query,
+        QueryParams params,
+        SettingsValidationContext settingsValidationCtx,


I wonder if we should consider embedding inferenceSettings in SettingsValidationContext?
I know that SettingsValidationContext serves a different purpose, so maybe we can just have a ValidationContext that is initialized in EsqlSession and can receive whatever context is necessary for validation during parsing?
Happy to do this as a separate follow up and not in this PR, since it would increase the scope.

elasticsearchmachine added the v9.3.0 label Dec 4, 2025

afoucret mentioned this pull request Dec 8, 2025

[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK #138219

Closed

afoucret changed the title ~~Esql usage limit v3~~ Dec 8, 2025

afoucret added >enhancement :Search Relevance/ES|QL Search functionality in ES|QL labels Dec 8, 2025

afoucret added 7 commits December 8, 2025 10:38

Add a new transport version for inference row limits.

b26ab71

Add fixed row limits.

e673737

Ensure STATS is working after RERANK / COMPLETION

264c96e

Fixing tests.

f44b4d6

Add settings for completion and rerank ES|QL commands.

ef57a6a

Adding parser tests for inference limits.

3e88a16

Lint

2f2c354

afoucret force-pushed the esql-usage-limit-v3 branch from afdfd7c to 2f2c354 Compare December 8, 2025 09:49

afoucret marked this pull request as ready for review December 8, 2025 10:30

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 8, 2025

afoucret and others added 6 commits December 8, 2025 11:30

Update docs/changelog/139074.yaml

c031ba3

Fix benchmarks.

8ddd5e7

Merge branch 'main' into esql-usage-limit-v3

a5b3597

Add integration test cases for Rerank and Completion.

2026aa6

Ensure inference settings dynamic are updated.

535dbe9

Update doc to mention implicit limit in 9.3.0

3fe9f17

afoucret added 2 commits December 8, 2025 17:38

Reformat doc.

5f25065

Fix forbidden APIs usage in tests.

b414480

afoucret force-pushed the esql-usage-limit-v3 branch from ff1608f to b414480 Compare December 9, 2025 09:12

Merge branch 'main' of github.com:elastic/elasticsearch into esql-usa…

fe47a81

…ge-limit-v3

afoucret mentioned this pull request Dec 9, 2025

[ES|QL] Cannot use STATS after RERANK or COMPLETION #138582

Closed

afoucret linked an issue Dec 9, 2025 that may be closed by this pull request

[ES|QL] Cannot use STATS after RERANK or COMPLETION #138582

Closed

afoucret closed this Dec 9, 2025

afoucret reopened this Dec 9, 2025

ioanatia reviewed Dec 9, 2025

View reviewed changes

afoucret added 3 commits December 9, 2025 12:43

Remove serialization / deserialization logic for Completion and Reran…

3e11e9b

…k (do not escape the coordinator node anymore).

Properly verify COMPLETION and RERANK row limit enforcement with suff…

8dcede5

…icient test data

Revert useless transport version change.

4786afa

afoucret requested a review from ioanatia December 9, 2025 11:45

[CI] Auto commit changes from spotless

827c576

ioanatia approved these changes Dec 9, 2025

View reviewed changes

Fix a failing unit test.

356d536

afoucret enabled auto-merge (squash) December 9, 2025 13:10

afoucret and others added 2 commits December 9, 2025 14:53

Merge branch 'main' into esql-usage-limit-v3

2def324

Fix test regressions.

c510766

afoucret merged commit 3d754d2 into elastic:main Dec 9, 2025
34 checks passed

afoucret mentioned this pull request Dec 11, 2025

[ES|QL] Inference Command : restore serialization / deserialization #139387

Merged

ioanatia mentioned this pull request Jan 26, 2026

Adding task_settings to completion command #140613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK#139074

[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK#139074
afoucret merged 23 commits intoelastic:mainfrom
afoucret:esql-usage-limit-v3

afoucret commented Dec 4, 2025 •

edited

Loading

elasticsearchmachine commented Dec 8, 2025

elasticsearchmachine commented Dec 8, 2025

elasticsearchmachine commented Dec 8, 2025

github-actions bot commented Dec 8, 2025

When to use applies_to tags:

What NOT to do:

ioanatia Dec 9, 2025

afoucret Dec 9, 2025

ioanatia Dec 9, 2025

afoucret Dec 9, 2025

ioanatia Dec 9, 2025

afoucret Dec 9, 2025

Uh oh!

leemthompo commented Dec 9, 2025

ioanatia Dec 9, 2025

Uh oh!

Labels

4 participants

	@Override
	public void writeTo(StreamOutput out) {
	throw new UnsupportedOperationException("not serialized");
	}

	@Override
	public String getWriteableName() {
	throw new UnsupportedOperationException("not serialized");
	}

Conversation

afoucret commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added implicits configurable Limit for COMPLETION and RERANK

Updated documentation

Others:

Closes: #136861

elasticsearchmachine commented Dec 8, 2025

elasticsearchmachine commented Dec 8, 2025

elasticsearchmachine commented Dec 8, 2025

github-actions bot commented Dec 8, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

ioanatia Dec 9, 2025

Choose a reason for hiding this comment

afoucret Dec 9, 2025

Choose a reason for hiding this comment

ioanatia Dec 9, 2025

Choose a reason for hiding this comment

afoucret Dec 9, 2025

Choose a reason for hiding this comment

ioanatia Dec 9, 2025

Choose a reason for hiding this comment

afoucret Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

leemthompo commented Dec 9, 2025

ioanatia Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

afoucret commented Dec 4, 2025 •

edited

Loading

Added implicits configurable Limit for `COMPLETION` and `RERANK`