[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK#139074
[ESQL][Inference] Introduce usage limits for COMPLETION and RERANK#139074afoucret merged 23 commits intoelastic:mainfrom
Conversation
|
Hi @afoucret, I've created a changelog YAML for you. |
afdfd7c to
2f2c354
Compare
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @afoucret, I've created a changelog YAML for you. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
ff1608f to
b414480
Compare
| Source.readFrom((PlanStreamInput) in), | ||
| in.readNamedWriteable(LogicalPlan.class), | ||
| in.readNamedWriteable(Expression.class), | ||
| in.getTransportVersion().supports(ESQL_INFERENCE_ROW_LIMIT_TRANSPORT_VERSION) |
There was a problem hiding this comment.
we don't need to introduce a new transport version, in fact we don't need this method at all.
now that RERANK and COMPLETION have an implicit limit - they will always be executed on the coordinator, meaning we never need to send them to the data nodes.
so we can further simplify this, remove the NamedWritable and just throw an exception if we need to serialize them (which would be a bug and a code path that should never be reached):
ChangePoint, Fuse, Fork etc are also not serialized.
There was a problem hiding this comment.
Completely removed the serialization logic for Rerank and Completion logical and physical plans.
| try (var resp = run(query)) { | ||
| List<List<Object>> values = getValuesList(resp); | ||
| // Should be limited by the default row limit (100) | ||
| assertThat(values.size(), lessThanOrEqualTo(100)); |
There was a problem hiding this comment.
the index we create always has 6 documents - so we are not really testing the limit enforcement.
let's change the createAndPopulateTestIndex so that we create an index with more than 100 documents when we test COMPLETION and more than 1000 when we test RERANK.
and here we should check that we get exactly 100 documents back.
There was a problem hiding this comment.
Test have been updated.
|
|
||
| try (var resp = run(query)) { | ||
| List<List<Object>> values = getValuesList(resp); | ||
| assertThat(values.size(), lessThanOrEqualTo(customLimit)); |
There was a problem hiding this comment.
let's check we get exactly customLimit docs back (take a look at the other comment that suggests changing createAndPopulateTestIndex.
There was a problem hiding this comment.
Tests have been updated.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceService.java
Show resolved
Hide resolved
|
Docs preview LGTM, nice! |
…k (do not escape the coordinator node anymore).
| public EsqlStatement parse( | ||
| String query, | ||
| QueryParams params, | ||
| SettingsValidationContext settingsValidationCtx, |
There was a problem hiding this comment.
I wonder if we should consider embedding inferenceSettings in SettingsValidationContext?
I know that SettingsValidationContext serves a different purpose, so maybe we can just have a ValidationContext that is initialized in EsqlSession and can receive whatever context is necessary for validation during parsing?
Happy to do this as a separate follow up and not in this PR, since it would increase the scope.
Added implicits configurable Limit for
COMPLETIONandRERANKUpdated documentation
Others:
Closes: #136861