Add usage stats for semantic_text fields by dimitris-athanasiou · Pull Request #135262 · elastic/elasticsearch

dimitris-athanasiou · 2025-09-23T10:40:26Z

This change enhances usage reporting for the inference plugin to account for usage of semantic_text fields.

Usage is bucketed by service and task_type. For each bucket this adds a semantic_text object that contains:

field_count: the number of semantic_text fields that use an inference endpoint of that service/task_type.
indices_count: the number of indices that contain at least one semantic_field referencing an inference endpoint of that service/task_type.
inference_id_count: the number of distinct inference endpoints of that service/task_type used by semantic_text fields.

In addition, this change adds two new kinds of buckets that facalitate getting aggregate usage information:

_all buckets are added by task_type. These allow summation of usage info for all inference endpoints of a particular task_type.
default model buckets are added. Those contain usage info for models that are included by default.

elasticsearchmachine · 2025-09-23T10:40:52Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2025-09-23T10:40:52Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-09-23T10:40:52Z

Hi @dimitris-athanasiou, I've created a changelog YAML for you.

dimitris-athanasiou · 2025-09-23T14:50:55Z

docs/changelog/135262.yaml

@@ -0,0 +1,5 @@
+pr: 135262
+summary: Add usage stats for `semantic_text` fields
+area: "Search"


Not sure if the area should be Search or Machine Learning

"Vector Search" is my vote

elasticsearchmachine · 2025-09-23T16:29:04Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

elasticsearchmachine · 2025-09-23T16:31:00Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

kderusso

Nice work! Are there any existing yaml tests that we could update here too?

.../core/src/test/java/org/elasticsearch/xpack/core/inference/usage/SemanticTextStatsTests.java

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

dimitris-athanasiou · 2025-09-24T08:35:07Z

I have added YAML tests as per your suggestion @kderusso

Mikep86

Great work! I left some comments, but they're almost all nits. Per our offline discussion, I think we need to figure out if we want to filter out all hidden indices (i.e. those that start with .). Other than that, this PR is in good shape 🚀

Mikep86 · 2025-09-24T15:27:31Z

docs/changelog/135262.yaml

@@ -0,0 +1,5 @@
+pr: 135262
+summary: Add usage stats for `semantic_text` fields
+area: "Search"


"Vector Search" is my vote

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

...c/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageActionTests.java

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/inference/inference_usage.yml

Mikep86 · 2025-09-24T18:22:53Z

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/inference/inference_usage.yml

+                type: semantic_text
+              field_2:
+                type: semantic_text
+                inference_id: .multilingual-e5-small-elasticsearch


A note on default endpoint usage in CI: We have noticed flakiness when we actually use them to generate embeddings due to deployment timeouts and failures. This reference should be fine because it's not actually triggering deployment of the E5 model, but something to be aware of in general.

dimitris-athanasiou · 2025-09-25T11:36:09Z

@Mikep86 Regarding hidden indices. I have excluded them too for now. They are probably of little value telemetry-wise. However, if we want to collect them in the future we can easily have a separate section for them by adding a SemanticTextStats object as a member of SemanticTextStats. We could then gather system/hidden field stats in there. Had an offline chat with @jimczi about this solution and we're on the same page.

Mikep86

Looks really good, thanks for the changes! Functionality looks solid. I left a few comments for one more round of small changes, then I think we're good to merge!

...ugin/core/src/main/java/org/elasticsearch/xpack/core/inference/InferenceFeatureSetUsage.java

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

...c/test/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageActionTests.java

Mikep86

LGTM! I have some minor comments in my previous review, but none of them are blocking.

This change enhances usage reporting for the inference plugin to account for usage of `semantic_text` fields. Usage is bucketed by service and task_type. For each bucket this adds a `semantic_text` object that contains: - `field_count`: the number of `semantic_text` fields that use an inference endpoint of that service/task_type. - `indices_count`: the number of indices that contain at least one `semantic_field` referencing an inference endpoint of that service/task_type. - `inference_id_count`: the number of distinct inference endpoints of that service/task_type used by `semantic_text` fields. In addition, this change adds two new kinds of buckets that facalitate getting aggregate usage information: - `_all` buckets are added by `task_type`. These allow summation of usage info for all inference endpoints of a particular `task_type`. - default model buckets are added. Those contain usage info for models that are included by default.

This reverts commit d72707d.

dimitris-athanasiou · 2025-09-29T10:49:19Z

@Mikep86 Thank you for the additional review! I have addressed those points in 3d4f273

kderusso

Nice work!

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

davidkyle · 2025-09-29T14:28:52Z

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

+        Map<String, ModelStats> endpointStats
+    ) {
+        for (TaskType taskType : TaskType.values()) {
+            if (taskType == TaskType.ANY) {


The RERANK, COMPLETION and CHAT_COMPLETION task types will never appear in a semantic text field and also can be skipped here.

Add a static function TaskType [] semanticTextTaskTypes( return SPARSE + TEXT) to TaskType and just iterate those.

At this point we are adding top level _all stats buckets. Even though we won't have semantic_text_stats for those tasks, we can still have top level buckets for count and for the consistency/symmetry with what we do for the semantic_text supporting tasks. What do you reckon?

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/inference/inference_usage.yml

Mikep86

I have some suggestions about how to simplify the implementation

Mikep86 · 2025-09-29T17:52:50Z

server/src/main/java/org/elasticsearch/inference/TaskType.java

+    private final boolean isCompatibleWithSemanticText;
+
+    TaskType(boolean isCompatibleWithSemanticText) {
+        this.isCompatibleWithSemanticText = isCompatibleWithSemanticText;
+    }


I personally wouldn't modify TaskType directly to add this metadata. We only need it in TransportInferenceUsageAction, I would build a set of task types that need semantic text stats in that class.

Done in 632b7cb

Mikep86 · 2025-09-29T17:54:38Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/usage/ModelStats.java

    }

    public ModelStats(String service, TaskType taskType, long count) {
+        this(service, taskType, count, taskType.isCompatibleWithSemanticText() ? new SemanticTextStats() : null);


I think it would be simpler to always set semanticTextStats to null by default. Code paths in TransportInferenceUsageAction can pass in a non-null instance as necessary.

Done in 632b7cb

Mikep86

LGTM! Thanks for the continued iterations on this :)

davidkyle

LGTM

dimitris-athanasiou added >enhancement :ml Machine learning :Search Relevance/Search Catch all for Search Relevance v9.2.0 labels Sep 23, 2025

elasticsearchmachine added Team:ML Meta label for the ML team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Sep 23, 2025

dimitris-athanasiou commented Sep 23, 2025

View reviewed changes

kderusso added the :SearchOrg/Relevance Label for the Search (solution/org) Relevance team label Sep 23, 2025

elasticsearchmachine added the Team:Search - Relevance The Search organization Search Relevance team label Sep 23, 2025

dimitris-athanasiou added :Search Foundations/Search Catch all for Search Foundations and removed :Search Relevance/Search Catch all for Search Relevance labels Sep 23, 2025

elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Sep 23, 2025

kderusso reviewed Sep 23, 2025

View reviewed changes

.../core/src/test/java/org/elasticsearch/xpack/core/inference/usage/SemanticTextStatsTests.java Outdated Show resolved Hide resolved

...ce/src/main/java/org/elasticsearch/xpack/inference/action/TransportInferenceUsageAction.java Show resolved Hide resolved

dimitris-athanasiou force-pushed the usage-for-semantic-text branch from c0144ff to 615b98d Compare September 24, 2025 08:33

dimitris-athanasiou force-pushed the usage-for-semantic-text branch from e9b092d to 40c5646 Compare September 24, 2025 12:42

Mikep86 reviewed Sep 24, 2025

View reviewed changes

dimitris-athanasiou force-pushed the usage-for-semantic-text branch from 76a002b to 86e022e Compare September 25, 2025 08:58

Mikep86 reviewed Sep 26, 2025

View reviewed changes

Mikep86 approved these changes Sep 26, 2025

View reviewed changes

dimitris-athanasiou added 3 commits September 29, 2025 13:39

Update docs/changelog/135262.yaml

7f4ca01

Fix changelog

abfcf80

dimitris-athanasiou and others added 12 commits September 29, 2025 13:39

Prepare SemanticTextStatsTests for BWC testing

00fb4b2

Add YAML test

2f05e03

Fix YAML test

fb71c57

Revert "Fix YAML test"

421e77e

This reverts commit d72707d.

Strip linux suffix from model_id for default stats

79e1d95

Correct linux suffix this time

1922534

Changelog area is Vector Search

e286108

Address some review points

70fe35b

Address evil edge case

6e76c14

Do not omit zero values

8d2796d

[CI] Auto commit changes from spotless

4d2c2fd

Also exclude hidden indices

3656419

dimitris-athanasiou force-pushed the usage-for-semantic-text branch from cb6e0c8 to 5a93315 Compare September 29, 2025 10:40

Address more review comments

3d4f273

dimitris-athanasiou force-pushed the usage-for-semantic-text branch from 5a93315 to 3d4f273 Compare September 29, 2025 10:48

dimitris-athanasiou added 2 commits September 29, 2025 14:48

Merge branch 'main' into usage-for-semantic-text

a9f7d79

Merge branch 'main' into usage-for-semantic-text

d9dc8b8

kderusso approved these changes Sep 29, 2025

View reviewed changes

davidkyle reviewed Sep 29, 2025

View reviewed changes

dimitris-athanasiou added 2 commits September 29, 2025 19:14

Merge branch 'main' into usage-for-semantic-text

6f3b519

Only add semantic_text stats if task_type is compatible

809631b

Mikep86 reviewed Sep 29, 2025

View reviewed changes

dimitris-athanasiou added 2 commits September 29, 2025 21:15

Contain task type compatibility in TransportInferenceUsageAction

632b7cb

Merge branch 'main' into usage-for-semantic-text

c2e6dd3

Mikep86 approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into usage-for-semantic-text

4408342

davidkyle approved these changes Sep 30, 2025

View reviewed changes

dimitris-athanasiou merged commit 2c20c49 into elastic:main Sep 30, 2025
34 checks passed

dimitris-athanasiou deleted the usage-for-semantic-text branch September 30, 2025 15:19

Conversation

dimitris-athanasiou commented Sep 23, 2025

elasticsearchmachine commented Sep 23, 2025

elasticsearchmachine commented Sep 23, 2025

elasticsearchmachine commented Sep 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 23, 2025

elasticsearchmachine commented Sep 23, 2025

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dimitris-athanasiou commented Sep 24, 2025

Mikep86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

dimitris-athanasiou commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

dimitris-athanasiou commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mikep86 left a comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants

dimitris-athanasiou commented Sep 25, 2025 •

edited

Loading

dimitris-athanasiou commented Sep 29, 2025 •

edited

Loading