Skip to content

Discrepancy in kubelet_volume_stats_* Metrics Between EKS 1.31 (AL2) and EKS 1.33 (AL2023)

0

Hello AWS and EKS Experts,

We are experiencing a critical issue with metric availability after upgrading one of our EKS clusters and are seeking clarification on whether this is expected behavior.

Problem Summary

Our primary goal is to monitor PersistentVolumeClaim (PVC) usage per namespace using a standard Prometheus query that relies on kubelet_volume_stats_* metrics. This monitoring has been working perfectly in our production environment, but these metrics are completely missing after upgrading to EKS 1.33.

Environment Comparison

✅ Working Environment (Production)

EKS Version: 1.31

Node Groups: EKS Managed Node Group

AMI Type: AL2_x86_64

Observed Behavior: kubelet_volume_stats_* metrics are scraped successfully

CSI Drivers: EBS CSI, EFS CSI (both working)

❌ Non-Working Environment (After Upgrade)

EKS Version: 1.33

Node Groups: EKS Managed Node Group

AMI Type: AL2023_ARM_64_STANDARD

Observed Behavior: kubelet_volume_stats_* metrics completely absent from kubelet /metrics endpoint

CSI Drivers: Same configuration as working environment


This issue prevents us from using our standard PromQL query for PVC monitoring:

max by(namespace, persistentvolumeclaim) (
  (
    kubelet_volume_stats_capacity_bytes
    - kubelet_volume_stats_available_bytes
  ) / kubelet_volume_stats_capacity_bytes
) * 100

We have checked the official Kubernetes documentation (v1.33), which still lists kubelet_volume_stats_capacity_bytes, kubelet_volume_stats_available_bytes, etc., as available kubelet metrics. This creates confusion, as we would expect a standard Kubernetes metric to be available in a standard EKS environment.

Our assumption is that this change is related to the architectural differences between the Amazon Linux 2 (AL2) AMI and the newer Amazon Linux 2023 (AL2023) AMI, specifically regarding how cAdvisor is integrated in a pure CRI/CSI environment without dockershim.


Our Questions Are:

Is the absence of kubelet_volume_stats_* metrics on EKS nodes using the AL2023 AMI an expected and intentional change?

If this is intentional, could you confirm the underlying architectural reason? Is it due to the full adoption of the CSI storage model, which renders the old kubelet volume metric collection path obsolete?

What is the official, AWS-recommended method or PromQL query for monitoring PVC disk usage on modern EKS clusters (1.33+ with AL2023 AMIs)? We would appreciate official guidance.

asked a month ago100 views
1 Answer
0

When experiencing the absence of kubelet_volume_stats_* metrics after upgrading from EKS 1.31 with AL2 to EKS 1.33 with AL2023_ARM_64_STANDARD, there are several important factors to consider.

The change in metric availability is likely related to the architectural differences between Amazon Linux 2 and Amazon Linux 2023. AL2023 introduces a new node initialization process called nodeadm that uses a YAML configuration schema, which differs significantly from the process used in AL2. This change in initialization could affect how the kubelet interacts with the underlying system and how metrics are exposed.

Additionally, AL2023 features the next generation of unified control group hierarchy (cgroupv2), which could impact how container and volume metrics are collected and exposed. The transition from x86_64 to ARM64 architecture might also contribute to differences in metric collection.

For monitoring PVC disk usage on EKS 1.33 with AL2023, you may need to:

  1. Check if you need to update your Prometheus configuration to access the new metrics endpoints. Amazon EKS has enhanced Kubernetes control plane observability, and you might need to update your Prometheus cluster role permissions to get the new metrics.

  2. Verify that your CloudWatch agent configuration (if used) has the correct metric_declaration section in the config map: prometheus-cwagentconfig to properly capture and format the metrics.

  3. Consider using Amazon Managed Service for Prometheus or CloudWatch to monitor the metrics, as they may have been updated to work with the new architecture.

It's worth noting that Amazon EKS will no longer publish EKS-optimized Amazon Linux 2 (AL2) AMIs after November 26th, 2025, and Kubernetes version 1.32 is the last version for which Amazon EKS will release AL2 AMIs. From version 1.33 onwards, Amazon EKS will continue to release AL2023 and Bottlerocket based AMIs, so adapting to the new metric collection methods will be necessary for future-proofing your monitoring.
Sources
Amazon Linux 2023: task blocked for more than 122 seconds | AWS re:Post
Upgrade from Amazon Linux 2 to Amazon Linux 2023 - Amazon EKS
Prometheus metrics troubleshooting on Amazon EKS and Kubernetes clusters - Amazon CloudWatch
Amazon EKS enhances Kubernetes control plane observability | Containers
Create nodes with optimized Amazon Linux AMIs - Amazon EKS

answered a month ago