Skip to content

[Prometheus] Remote - Write: Enforce Maxdecoding Length check#48218

Merged
gizas merged 20 commits intomainfrom
remote_write_maxdecoding
Jan 9, 2026
Merged

[Prometheus] Remote - Write: Enforce Maxdecoding Length check#48218
gizas merged 20 commits intomainfrom
remote_write_maxdecoding

Conversation

@gizas
Copy link
Contributor

@gizas gizas commented Dec 23, 2025

  • Bug

Proposed commit message

  • WHAT: Enforces configurable size limits on incoming requests
  • WHY: #Closes - Unbounded Memory Allocation in Metricbeat Prometheus remote_write module via Snappy decompression

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

  • Follow the steps of doc to deploy elastic stack and metricbeat locally

    Specifically add the following configuration for remote-write prometheus module:

    Prometheus Module Config
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: metricbeat-daemonset-modules
      namespace: kube-system
      labels:
        k8s-app: metricbeat
    data:
      prometheus.yml: |-
        - module: prometheus
          metricsets:
            - remote_write
          host: "0.0.0.0"
          port: 9201
          period: 10s
  • Install prometheus in your kind cluster

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-server prometheus-community/prometheus --namespace kube-system
  • Install the following service in order to be able your prometheus server to send remote-write metrics to metricbeat

    Metricbeat Service
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: metricbeat-remote-write
      namespace: kube-system
      labels:
        k8s-app: metricbeat
    spec:
      ports:
        - port: 9201
          protocol: TCP
          targetPort: 9201
      selector:
        k8s-app: metricbeat
      sessionAffinity: None
      type: ClusterIP
  • Configure prometheus server with remote write

    Metricbeat Service
    helm upgrade --install prometheus-server prometheus-community/prometheus -f values.yml
    
    $ cat values.yml
    server:
    remoteWrite:
      - url: http://metricbeat-remote-write.kube-system:9201/write
  • Save following to the trigger-pod.yaml. Install following pod to be able to create snappy requests and test the new configs

    kubeclt apply -f trigger-pod.yaml
      ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: trigger-script
      namespace: kube-system
    data:
      trigger_metricbeat_remote_write.py: |
        #!/usr/bin/env python3
        """
        Send crafted Snappy payloads to test Metricbeat remote_write size limits.
        
        This script can send a tiny payload that claims a huge decoded size,
        useful for testing the DecodedLen security check.
        """
    
        from __future__ import annotations
    
        import argparse
        import http.client
        import socket
        import sys
    
        DEFAULT_DECODED_SIZE = 500 * 1024 * 1024  # 500 MB
    
    
        def encode_uvarint(value: int) -> bytes:
            """Encode an integer as a varint (used in snappy header)."""
            out = bytearray()
            while value >= 0x80:
                out.append((value & 0x7F) | 0x80)
                value >>= 7
            out.append(value)
            return bytes(out)
    
    
        def build_minimal_payload(claimed_decoded_size: int) -> bytes:
            """
            Build a minimal snappy payload that claims a large decoded size.
            
            The payload is just the varint header + a few bytes of padding.
            This is useful for testing the DecodedLen check without sending
            megabytes of data.
            """
            header = encode_uvarint(claimed_decoded_size)
            # Add minimal padding (this will cause snappy.Decode to fail, 
            # but we're testing the DecodedLen check which happens first)
            return header + b'\x00\x00\x00\x00'
    
    
        def build_valid_payload(decoded_size: int) -> bytes:
            """
            Build a valid snappy payload of the specified decoded size.
            
            This creates actual valid snappy-compressed data.
            """
            TAG_LITERAL = 0x00
            TAG_COPY2 = 0x02
            
            frame = bytearray()
            frame.extend(encode_uvarint(decoded_size))
    
            if decoded_size < 4:
                # Small literal
                n = decoded_size - 1
                frame.append((n << 2) | TAG_LITERAL)
                frame.extend(b"\x00" * decoded_size)
                return bytes(frame)
    
            # Emit a small literal followed by copy operations
            literal_len = ((decoded_size - 4) % 64) + 4
            n = literal_len - 1
            if n < 60:
                frame.append((n << 2) | TAG_LITERAL)
            else:
                frame.extend(((60 << 2) | TAG_LITERAL, n))
            frame.extend(b"\x00" * literal_len)
    
            remaining = decoded_size - literal_len
            if remaining > 0:
                copy_chunk = bytes([((64 - 1) << 2) | TAG_COPY2, 0x01, 0x00])
                repetitions = remaining // 64
                frame.extend(copy_chunk * repetitions)
            
            return bytes(frame)
    
    
        def main() -> int:
            parser = argparse.ArgumentParser(
                description=__doc__,
                formatter_class=argparse.RawDescriptionHelpFormatter
            )
            parser.add_argument("--host", default="metricbeat-remote-write.kube-system",
                                help="Target host (default: metricbeat-remote-write.kube-system)")
            parser.add_argument("--port", type=int, default=9201,
                                help="Target port (default: 9201)")
            parser.add_argument("--decoded-len", type=int, default=DEFAULT_DECODED_SIZE,
                                help="Decoded size to claim in bytes (default: 500MB)")
            parser.add_argument("--path", default="/",
                                help="HTTP path (default: /)")
            parser.add_argument("--minimal", action="store_true",
                                help="Send minimal payload (just header) to test DecodedLen check")
            parser.add_argument("--valid", action="store_true",
                                help="Send valid snappy payload (larger, tests actual decompression)")
            args = parser.parse_args()
    
            if args.minimal:
                payload = build_minimal_payload(args.decoded_len)
                mode = "minimal (header only)"
            elif args.valid:
                payload = build_valid_payload(args.decoded_len)
                mode = "valid snappy"
            else:
                # Default to minimal for testing
                payload = build_minimal_payload(args.decoded_len)
                mode = "minimal (header only)"
    
            headers = {
                "Content-Type": "application/x-protobuf",
            }
    
            print(f"[*] Mode: {mode}")
            print(f"[*] Sending {len(payload)} bytes, claiming {args.decoded_len} bytes decoded")
            print(f"[*] Target: {args.host}:{args.port}{args.path}")
            print()
    
            conn = http.client.HTTPConnection(args.host, args.port, timeout=10)
            try:
                conn.request("POST", args.path, body=payload, headers=headers)
                resp = conn.getresponse()
                data = resp.read().decode('utf-8', errors='replace')
                
                if resp.status == 413:
                    print(f"[+] SUCCESS: Got HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data.strip()}")
                elif resp.status == 400:
                    print(f"[*] Got HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data.strip()}")
                elif resp.status == 202:
                    print(f"[-] UNEXPECTED: Request was accepted (HTTP {resp.status})")
                    print(f"    The size limit may not be working!")
                else:
                    print(f"[?] HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data[:500]}")
                    
            except (ConnectionResetError, BrokenPipeError, http.client.RemoteDisconnected) as exc:
                print(f"[!] Connection dropped: {exc}")
            except socket.timeout:
                print(f"[!] Connection timed out")
            except Exception as exc:
                print(f"[!] Error: {exc}")
            finally:
                conn.close()
            
            return 0
    
    
        if __name__ == "__main__":
            sys.exit(main())
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: trigger-pod
      namespace: kube-system
      labels:
        app: trigger-pod
    spec:
      containers:
        - name: python
          image: python:3.11-slim
          command: ["sleep", "infinity"]
          volumeMounts:
            - name: script
              mountPath: /scripts
      volumes:
        - name: script
          configMap:
            name: trigger-script
            defaultMode: 0755
      restartPolicy: Never

Screenshots

The metrics ingestion from prometheus remote write works fine with no problems

Screenshot 2025-12-23 at 2 34 05 PM

No restars of metricbeat during our tests:

kgp -n kube-system
NAME                                                        READY   STATUS    RESTARTS   AGE
...
metricbeat-5xm98                                            1/1     Running   0          60m

Logs

kubectl exec -it trigger-pod -n kube-system -- python /scripts/trigger_metricbeat_remote_write.py --decoded-len 1073741824

[*] Mode: minimal (header only)
[*] Sending 9 bytes, claiming 1073741824 bytes decoded
[*] Target: metricbeat-remote-write.kube-system:9201/

[+] SUCCESS: Got HTTP 413 Request Entity Too Large
    Response: decoded length too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)

{"log.level":"warn","@timestamp":"2025-12-23T12:03:36.390Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":191},"message":"Decoded lenght too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)","service.name":"metricbeat","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2025-12-23T11:44:12.876Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":174},"message":"Request body too large: exceeds 2097152 bytes limit","service.name":"metricbeat","ecs.version":"1.6.0”}
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas gizas requested review from a team as code owners December 23, 2025 12:39
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
@gizas gizas added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 23, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025
@mergify mergify bot assigned gizas Dec 23, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 23, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gizas? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.
@github-actions
Copy link
Contributor

github-actions bot commented Dec 23, 2025

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas gizas added backport-active-all Automated backport with mergify to all the active branches backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches and removed backport-active-all Automated backport with mergify to all the active branches labels Dec 24, 2025
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas gizas enabled auto-merge (squash) December 30, 2025 09:05
@gizas gizas disabled auto-merge January 7, 2026 07:25
gizas added 2 commits January 7, 2026 10:56
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas gizas requested a review from a team as a code owner January 7, 2026 09:32
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas
Copy link
Contributor Author

gizas commented Jan 7, 2026

@elastic/beats-tech-leads can I have a review please?

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving use of the year 2026 :)

@gizas gizas enabled auto-merge (squash) January 9, 2026 07:20
@gizas gizas merged commit 38cc702 into main Jan 9, 2026
45 checks passed
@gizas gizas deleted the remote_write_maxdecoding branch January 9, 2026 08:34
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

@Mergifyio backport 9.1 9.2 9.3

@mergify
Copy link
Contributor

mergify bot commented Jan 9, 2026

mergify bot pushed a commit that referenced this pull request Jan 9, 2026
* fix the snappy decompression

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

* merging new config vars

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
(cherry picked from commit 38cc702)
mergify bot pushed a commit that referenced this pull request Jan 9, 2026
* fix the snappy decompression

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

* merging new config vars

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
(cherry picked from commit 38cc702)
mergify bot pushed a commit that referenced this pull request Jan 9, 2026
* fix the snappy decompression

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

* merging new config vars

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
(cherry picked from commit 38cc702)
gizas added a commit that referenced this pull request Jan 12, 2026
#48363)

* fix the snappy decompression



* merging new config vars



---------



(cherry picked from commit 38cc702)

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
gizas added a commit that referenced this pull request Jan 12, 2026
#48362)

* fix the snappy decompression



* merging new config vars



---------



(cherry picked from commit 38cc702)

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
@gizas gizas added the backport-8.19 Automated backport to the 8.19 branch label Feb 18, 2026
mergify bot pushed a commit that referenced this pull request Feb 18, 2026
* fix the snappy decompression

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

* merging new config vars

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
(cherry picked from commit 38cc702)

# Conflicts:
#	docs/reference/metricbeat/metricbeat-metricset-prometheus-remote_write.md
#	metricbeat/module/prometheus/remote_write/_meta/docs.md
#	x-pack/metricbeat/module/prometheus/remote_write/_meta/docs.md
gizas added a commit that referenced this pull request Feb 18, 2026
…ing Length check (#48918)

* [Prometheus] Remote - Write:  Enforce Maxdecoding Length check (#48218)

* fix the snappy decompression

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Brandon Morelli <brandon.morelli@elastic.co>
(cherry picked from commit 38cc702)

# Conflicts:
#	docs/reference/metricbeat/metricbeat-metricset-prometheus-remote_write.md
#	metricbeat/module/prometheus/remote_write/_meta/docs.md
#	x-pack/metricbeat/module/prometheus/remote_write/_meta/docs.md

* update docs for ga 8.19.3

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

* update docs for ga 8.19.13

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

---------

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches prometheus Team:obs-ds-hosted-services Label for the Observability Hosted Services team

5 participants