Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Kubernetes Monitoring: One view for observing all your storage volumes

Kubernetes Monitoring: One view for observing all your storage volumes

2025-04-01 6 min

If you want to observe your entire Kubernetes environment, you need visibility into all of your resources, including storage volumes. But monitoring Kubernetes storage hasn’t always been easy, especially if you wanted to see how it related to other parts of your infrastructure. 

That’s why we’re excited to share the latest update to our Kubernetes Monitoring solution in Grafana Cloud: a new Storage tab that gives you a single view to help you track volume usage over time, conduct data forensics, and troubleshoot volume provisioning. It also includes a new alerts overlay that reduces context switching and makes it easier to spot issues, so let’s dive into how you can start putting this to use today.

A closer look at the Storage tab

In Kubernetes Monitoring, storage is represented by several concepts including (but not limited to) PersistentVolumes (PVs), PersistentVolumeClaims (PVCs), and StorageClasses. Understanding the interaction and relationships of these objects over time can help you answer key questions about the state of storage in a variety of scenarios, from preventative maintenance to root cause analysis. The responsibility for emitting this information as metrics lies amongst various Prometheus exporters—from kube-state-metrics to Kubelet and cAdvisor— but it can be a little tricky to see how they all fit together.

To address this challenge in the new Storage tab, we run PromQL queries behind the scenes to join these metrics together and help you visualize the relationships between pods and their volumes.

The Storage tab includes five new prebuilt panels (with more on the way soon!):

  • PVC storage class, for tracking the storage classes requested by your PVCs
  • Volume bytes, which compares claim requests with actual volume capacity and usage
  • Volume inodes, which compares volume inode capacity with usage
  • PVC status, to monitor the binding status between the claim and the volume from the claim side 
  • PV status, to monitor the binding status between the claim and the volume from the volume side

We designed the new tab so that you can see both high- and low-level storage metrics over time, which is particularly useful in revealing correlations and ad hoc cause-effect analysis. Higher-level PVC information is toward the top, and as you move further down the page you move into lower-level data, ending in the PV information itself.

The tab view at different levels

The tab view is available on the cluster, namespace, workload, node, and pod detail pages.

When viewing at the pod-level, you’ll see one set of panels for each volume, which makes it easier to observe and understand individual volumes. On the PV status panel, the PV name itself is displayed, so you can be sure you’re looking at the correct dynamically provisioned block device, even if they are sometimes elaborately named by the Container Storage Interface driver.

It’s worth noting that StatefulSet replicas may have different provisioned volumes over time, especially if they’re controlled dynamically by a Horizontal Pod Autoscaler. For example, let’s say you’re tracking down a pod by an associated provisioned volume name that you’ve taken from your cloud storage provider. Be aware that the PV status panel is capable of showing exactly when and for how long that particular block device was bound, along with any other block devices that may have been bound to the same pod and claim.

Moving up through the levels above pod, the volume bytes and inodes panels show aggregated data across the whole resource on the left half of the view, as well as a breakdown by sub-resource on the right half of the view.

PVC volume panels
Part of the storage tab as seen from the namespace-level view, showing the entire namespace on the left and breakdown by workload on the right.

This means, depending on which level you choose to view your storage from, you can see everything from which pod in a multi-replica workload has volumes close to capacity, through to namespace-averaged volume usage, or even the total provisioned volume capacity across an entire cluster.

The cluster-level view is particularly useful for getting an overview of the distribution of StorageClasses over time. This information can be used to track a migration to a more cost-effective or performant volume type, for example.

Kubelet and cAdvisor

It’s worth noting that kubelet, the primary Kubernetes node agent, also has volume metrics. These are great for seeing the number of bytes and inodes used, as well as the actual capacity of bound volumes.

For volume discovery regardless of bind status, the metrics from kube-state-metrics are useful because the requested volumes remain visible even if they are not successfully bound or were already released. However, there is a nuance with volume capacity, as the storage requests on the PVC may not exactly match the actual capacity of the volume, so we show both requests and capacity on the volume bytes panel by combining the metrics from both exporters. 

In future versions of Kubernetes Monitoring in Grafana Cloud, we’ll integrate the file system metrics from cAdvisor so that you can see IOPS and throughput alongside all of the new panels above. From there you’ll be able to make informed decisions about which Storage Classes might be most cost effective for your workloads, based on real world data.

Alerts and alert overlays

The Storage feature comes with three new alerts, which we’ve adopted from the kubernetes-mixin project (I’m also a maintainer for this project) that the Kubernetes Monitoring team helps to maintain. The new storage alerts (available since Kubernetes Monitoring backend version 2.1.0) are:

  • KubePersistentVolumeFillingUp
  • KubePersistentVolumeInodesFillingUp
  • KubePersistentVolumeErrors

To help understand how these alerts work, we’ve also added a new alert overlay to the panels. Let’s look at an example to illustrate how you can use this feature to identify issues quicker.

In the screenshot above, we’re showing the “PVC Volume bytes by pod (avg)” panel for a StatefulSet with two replicas. It looks like both of the pods in this workload have volumes with intermittent warning alerts. If I look at the first pod, registry-0, I can see it’s the KubePersistentVolumeFillingUp alert. This alert has two severities: warning and critical. The critical threshold is 97%, so this is plotted on the chart as the red dashed line—and we’re not there, yet. 

The warning behavior is a little different; it uses the predict_linear PromQL function to analyze the historical series and tries to determine if the volume will fill up within four days. As this depends on the rate of increase of used bytes on the volume, the output of the function can lead to intermittent firing. This behavior might be particularly difficult to grasp, unless it’s visualized in an overlay like this. The warning alert fires when the increase in volume usage is steep enough that, if usage were to continue at the same rate, the volume would likely fill to capacity within four days. The alert resolves when the rate of increase levels out, meaning the volume would take longer to fill.

We think this new overlay will save you time and effort when you’re troubleshooting issues, as it can reduce your context switching and help you gain a deeper understanding of your alerts with the Kubernetes Monitoring app. Lots of relevant information is pulled directly into these panels so you have one view to make informed decisions on your storage infrastructure.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!