feat(server): support ROCM for /api/usage endpoint by Set27 · Pull Request #9773 · marimo-team/marimo

Set27 · 2026-06-03T09:47:40Z

I have read the CLA Document and I hereby sign the CLA

📝 Summary

Add AMD gpu stats supported
#9237

📋 Pre-Review Checklist

~~- [ ] For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).~~

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

I have read the contributor guidelines.
Tests have been added for the changes made.
[not sure if any] Documentation has been updated where applicable, including docstrings for API changes.

vercel · 2026-06-03T09:47:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Jun 11, 2026 3:21pm

github-actions · 2026-06-03T09:47:54Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Set27 · 2026-06-03T09:50:07Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

This PR extends the server /api/usage endpoint GPU reporting to support AMD GPUs via ROCm (rocm-smi), alongside the existing NVIDIA (nvidia-smi) implementation, so the frontend can show GPU memory usage on ROCm systems.

Changes:

Detects whether NVIDIA or ROCm tooling is available and selects the appropriate stats collector.
Refactors GPU collection into dedicated parsing helpers for nvidia-smi and rocm-smi.
Adds a ROCm command definition for fetching VRAM stats via CSV output.

+            used_str = "0"
+        total = int(total_str)
+        used = int(used_str)
+        free = total - used


+    gpu_available = _is_gpu_available()
+    if gpu_available == "nvidia":
+        gpu_stats = _parse_nvidia_smi_stats()
+    elif gpu_available == "rocm":
+        gpu_stats = _parse_rocm_smi_stats()


Set27 · 2026-06-03T13:47:07Z

Rewrite using json output instead of csv
Fix current test
Add new test

Set27 · 2026-06-03T14:58:28Z

+    assert response.json()["gpu"] == []
+
+
+def test_usage_rocm_gpu(client: TestClient) -> None:


I would highly appreciate example output for nvidia-smi to add test for it

mscolnick · 2026-06-10T21:24:46Z

@cubic-dev-ai

cubic-dev-ai · 2026-06-10T21:24:55Z

@cubic-dev-ai

@mscolnick I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 2 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_server/api/endpoints/health.py">

<violation number="1" location="marimo/_server/api/endpoints/health.py:452">
P2: ROCm GPU stats silently report all-zero memory when expected JSON keys are missing</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Client as Client
    participant HealthAPI as /api/usage endpoint
    participant GpuDetect as _is_gpu_available()  
    participant NvidiaSMI as nvidia-smi process
    participant RocmSMI as rocm-smi process
    participant GpuParser as GPU stats parser
    
    Note over Client,GpuParser: GET /api/usage with ROCM GPU support
    
    Client->>HealthAPI: GET /api/usage
    HealthAPI->>HealthAPI: Collect CPU, memory, network stats
    
    HealthAPI->>GpuDetect: Check GPU availability
    alt No GPU tools found
        GpuDetect-->>HealthAPI: return False
        HealthAPI-->>Client: GPU stats = []
    else NVIDIA GPU detected
        GpuDetect->>NvidiaSMI: subprocess.run(_NVIDIA_GPU_STATS_CMD)
        NvidiaSMI-->>GpuDetect: CSV stdout
        GpuDetect-->>HealthAPI: return "nvidia"
        HealthAPI->>GpuParser: _parse_nvidia_smi_stats()
        GpuParser->>GpuParser: Parse CSV lines, handle [N/A]
        GpuParser-->>HealthAPI: list of GPU dicts
    else AMD ROCM GPU detected
        GpuDetect->>RocmSMI: subprocess.run(_AMD_GPU_STATS_CMD)
        RocmSMI-->>GpuDetect: JSON stdout (with possible WARNING prefix)
        GpuDetect-->>HealthAPI: return "rocm"
        HealthAPI->>GpuParser: _parse_rocm_smi_stats()
        GpuParser->>GpuParser: Strip warning lines, parse JSON
        Note over GpuParser: Extract card#, Card Series, VRAM bytes
        alt JSON parse error
            GpuParser-->>HealthAPI: return []
        else Success
            GpuParser-->>HealthAPI: list of GPU dicts
        end
    end
    
    alt GPU process failure
        HealthAPI->>HealthAPI: Log warning, continue
    end
    
    HealthAPI-->>Client: JSON response with GPU stats

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

mscolnick · 2026-06-10T21:30:10Z

@Set27 looks good. some failing CI tests

Set27 · 2026-06-11T15:15:18Z

@Set27 looks good. some failing CI tests

I can't reproduce CLI failing running the same command locally; I guess I run intro transient, so I rebase on the latest main.
Colud you start workflows one more time?

mscolnick · 2026-06-11T17:41:34Z

thank you for this feature @Set27 !

vercel Bot deployed to Preview June 3, 2026 09:48 View deployment

Set27 marked this pull request as draft June 3, 2026 09:52

mscolnick requested a review from Copilot June 3, 2026 12:59

Copilot started reviewing on behalf of mscolnick June 3, 2026 13:00 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

vercel Bot deployed to Preview June 3, 2026 14:16 View deployment

Set27 force-pushed the add-support-to-show-rocm-stats branch from 4547db8 to 702d180 Compare June 3, 2026 14:17

vercel Bot deployed to Preview June 3, 2026 14:18 View deployment

vercel Bot deployed to Preview June 3, 2026 14:34 View deployment

Set27 marked this pull request as ready for review June 3, 2026 14:57

vercel Bot deployed to Preview June 3, 2026 14:58 View deployment

Set27 commented Jun 3, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread marimo/_server/api/endpoints/health.py

mscolnick added the enhancement New feature or request label Jun 10, 2026

Set27 added 5 commits June 11, 2026 18:10

feat(server): support ROCM for /api/usage endpoint

fbea8a1

fix(server): rocm json output instead of csv

7e4b7b5

fix(test): remove gpu from memory test

ca73e26

fix(server): add type for rocm data

f504e6e

feat(tests): add no gpu case along with rocm gpu output

3c94bcc

Set27 force-pushed the add-support-to-show-rocm-stats branch from 1eba709 to 3c94bcc Compare June 11, 2026 15:12

vercel Bot deployed to Preview June 11, 2026 15:21 View deployment

mscolnick approved these changes Jun 11, 2026

View reviewed changes

mscolnick merged commit ad5cd89 into marimo-team:main Jun 11, 2026
36 of 39 checks passed

Set27 deleted the add-support-to-show-rocm-stats branch June 11, 2026 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(server): support ROCM for /api/usage endpoint#9773

feat(server): support ROCM for /api/usage endpoint#9773
mscolnick merged 5 commits into
marimo-team:mainfrom
Set27:add-support-to-show-rocm-stats

Set27 commented Jun 3, 2026 •

edited

Loading

vercel Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Set27 commented Jun 3, 2026

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Set27 commented Jun 3, 2026 •

edited

Loading

Set27 Jun 3, 2026

mscolnick commented Jun 10, 2026

cubic-dev-ai Bot commented Jun 10, 2026

cubic-dev-ai Bot left a comment

Uh oh!

mscolnick commented Jun 10, 2026

Set27 commented Jun 11, 2026

Uh oh!

mscolnick commented Jun 11, 2026

Labels

3 participants

		assert response.json()["gpu"] == []


		def test_usage_rocm_gpu(client: TestClient) -> None:

Uh oh!

Conversation

Set27 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Summary

📋 Pre-Review Checklist

✅ Merge Checklist

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Set27 commented Jun 3, 2026

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Set27 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Set27 Jun 3, 2026

Choose a reason for hiding this comment

mscolnick commented Jun 10, 2026

cubic-dev-ai Bot commented Jun 10, 2026

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mscolnick commented Jun 10, 2026

Set27 commented Jun 11, 2026

Uh oh!

mscolnick commented Jun 11, 2026

Labels

3 participants

Set27 commented Jun 3, 2026 •

edited

Loading

vercel Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Set27 commented Jun 3, 2026 •

edited

Loading