Skip to content

Add new sampling method to the Downsample API#136813

Merged
gmarouli merged 12 commits intoelastic:mainfrom
gmarouli:downsampling/add-new-sampling-method-api-only
Oct 22, 2025
Merged

Add new sampling method to the Downsample API#136813
gmarouli merged 12 commits intoelastic:mainfrom
gmarouli:downsampling/add-new-sampling-method-api-only

Conversation

@gmarouli
Copy link
Contributor

When downsampling gauge metrics, we create an aggregate object that contains the min, max, sum and value_count values that allow us to respond to the min, max, sum, count and average aggregations without losing any accuracy.

However, we recognise that for certain cases keeping just the last value might be good enough. For this reason, we add one more configuration option in the downsample API that allows a user to choose the sampling method between:

  • aggregate: when supported, this will downsample a metric by summarising its values in a aggregate document. This remains the default
  • last_value: this will keep the last value and discard the rest.
POST /my-time-series-index/_downsample/my-downsampled-time-series-index
{
  "fixed_interval": "1d",
  "sampling_method": "last_value"
}

This PR introduces the sampling method only in the downsampling API. There will be follow ups that will introduce it in the lifecycle management features.

Relates: #128357.

@gmarouli gmarouli requested a review from a team as a code owner October 20, 2025 12:46
@gmarouli gmarouli added >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Oct 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

if (method != null) {
return method;
}
boolean isIndexDownsampled = indexMetadata.getSettings().get(IndexMetadata.INDEX_DOWNSAMPLE_INTERVAL_KEY) != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, wouldn't fromString return a valid method above? Maybe I missed the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, the code as it is now will always have a sampling method. This is for backwards compatibility. Indices before this PR will not have sampling method defined in their metadata and we know that only the aggregate was available, so this is why we specify it here. Even if we ever change the default, this shouldn't change. I will add a comment so it is clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be just returning AGGREGATE then, without checking the setting again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this as well to avoid this check. If we do that we move the responsibility to the caller, it's possible that the caller has already checked that so this is a redundant check. On the other hand, I do think that it is part of this methods responsibility to return null when the index is not downsampled.

If it was on the critical path, I would consider removing the check and adding a warning in the javadoc of this method. However, considering that this is called mainly by the downsample API and by the data stream lifecycle in the future, I think it's a performance penalty we can accept. What do you think?

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just a few nits.

gmarouli and others added 2 commits October 21, 2025 10:42
…ticsearch/xpack/downsample/DownsampleIT.java

Co-authored-by: Kostas Krikellas <131142368+kkrik-es@users.noreply.github.com>
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @gmarouli!

There are also many downsample tests in DownsampleActionSingleNodeTests. Maybe also randomly use the new sample method there too?

@gmarouli
Copy link
Contributor Author

Thanks @martijnvg , apart from adjusting these tests, I ensured that all tests apart from (ILM, DLM & telemetry) are also using random sampling.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting the tests! LGTM

@gmarouli gmarouli merged commit 6809a35 into elastic:main Oct 22, 2025
34 checks passed
@gmarouli gmarouli deleted the downsampling/add-new-sampling-method-api-only branch October 22, 2025 08:04
gmarouli added a commit that referenced this pull request Oct 31, 2025
Following #136813, we expose to ILM the new sampling method config in the downsampling API. This will allow users to configure the sampling method in their downsample action of their ILM policies. For example:
```
PUT _ilm/policy/datastream_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_docs": 1
          },
          "downsample": {
  	          "fixed_interval": "1h",
  	          "force_merge_index": false,
                  "sampling_method": "aggregate"
  	  }
        }
      }
    }
  }
}
```
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025
When downsampling gauge metrics, we create an aggregate object that contains the min, max, sum and value_count values that allow us to respond to the min, max, sum, count and average aggregations without losing any accuracy.

However, we recognise that for certain cases keeping just the last value might be good enough. For this reason, we add one more configuration option in the downsample API that allows a user to choose the sampling method between:

aggregate: when supported, this will downsample a metric by summarising its values in a aggregate document. This remains the default
last_value: this will keep the last value and discard the rest.
```
POST /my-time-series-index/_downsample/my-downsampled-time-series-index
{
  "fixed_interval": "1d",
  "sampling_method": "last_value"
}
```
This PR introduces the sampling method only in the downsampling API. There will be follow ups that will introduce it in the lifecycle management features.

Relates: elastic#128357.
gmarouli added a commit that referenced this pull request Nov 3, 2025
…37023)

Following #136813, we expose to data stream lifecycle the new sampling method config in the downsampling API. This will allow users to configure the sampling method directly in the lifecycle configuration. For example:
```
PUT _data_stream/my-ds/_lifecycle
{
  "data_retention": "10d",
  "downsampling_method": "last_value",
  "downsampling": [
     {
       "after": "1d",
       "fixed_interval": "5m
     }
  ]
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v9.3.0

4 participants