LLM-D, Supercharged HPA and GKE AI Labs

Abdel SGHIOUAR

Senior Cloud Developer Advocate | Podcaster | Speaker | CNCF Ambassador | Kubestronaut | Human

Published May 30, 2025

+ Follow

The News

GKE

Introducing llm-d: llm-d is a Kubernetes-native distributed inference serving stack - a well-lit path for anyone to serve large language models at scale. It’s a collaboration between Google, Nvidia, IBM and Redhat aiming at simplifying LLM serving on Kubernetes. The project has multiple efforts you can read about in the blog or check the github repo.
Performance HPA profile GA and default in Autopilot: We introduced the new HPA Performance profile back in Nov 2024. Since we made many improvments. This new stack delivers 3x faster autoscaling and improved reliability at scale, supporting 1000 objects within SLO (an increase from 300). It also opens up expansion of HPA capabilities to support native custom metrics Integration, parallel processing, and tolerance handling, as well as future multidimensional capabilities.
GKE AI Labs: Is a new one stop shop for everything AI on GKE. We moved all tutorials and guides to this website. You can find code and steps to deploy LLMs but also OSS solutions to GKE.
Confidential nodes for GPU Workloads: GKE now supports using Confidential Nodes for GPU workloads. Depending on the version (check the release notes) various VM families are supported.
Container Optimized Compute is default: From GKE 1.32.3+ CoC (Container Optimize Compute) is the default Autoscaler stack. CoC is our revamped cluster autoscaler stack with improved Pod scheduling latency.
GKE Threat detection in SCC: Container Threat Detection works by triggering findings based on signals extracted from running containers on GKE. There are multiple types of signals like cli execution, malicious code execution…(more details).Now you can find these findings in Security Command Center.
vLLM TPU Support is GA: vLLM now supports TPU chips. This guide shows how to deploy Llama 3.1 70b on TPU v6 (Trillium) on GKE Autopilot.
[Live] Five Key Google Kubernetes Engine Features You Must Know: Tune-In on June 5th to hear about the five key GKE features you should know from Gari Singh.

The recordings for Google Cloud Next 2025 are available on-demand https://cloud.withgoogle.com/next/25/session-library?filters=vod-recorded-session#all

Recommended by LinkedIn

DeepSeek vs. OpenAI: Can AI Thrive Without Massive…

ChandraKumar R Pillai 8 months ago

Sudden Disturbances in Rapidly Moving Objects : The…

Tomasz Tunguz 1 year ago

Everything you need to know about IBM Granite 3.2

IBM Research 7 months ago

Article content — https://cloud.withgoogle.com/next/25/session-library?filters=vod-recorded-session#all

AI/ML

Deploy to Cloud Run from AI Studio: You can start working on an app on AI Studio and deploy directly to Cloud Run. Even if you app needs a local LLM Cloud Run supports running LLMs with a GPU attached to them. Check it out it’s cool.
Deploy to Cloud Run from VertexAI: Same as AI Studio, Vertex AI also supports deploying GenAI apps to Cloud Run straight from the console.
Gemini Cloud Assist launched new cool stuff: Asking Cloud Monitoring about incidents, Artifacts Analysis about detected vulnerabilities and test org policies among other things. Cloud Assist is launching a lot of cool stuff. Check out the release notes.
Transforming Kubernetes and GKE into the leading platform for AI/ML: This is not news per-se but rather a summary of all the work we are doing in Kubernetes and GKE to make the prime platform for running your AI/ML Workloads.

The Community

Zero-Downtime Pod Migration in Kubernetes: Learn how to acheive near-zero downtime migrations in Kubernetes using readinessProbe and preStop lifecycle hooks.
GKE Cost Analysis with BigQuery and KubeCost: Learn how to combine BigQuery and KubeCost to analyse and manage GKE costs.
Cloud Service Mesh global control, zero pain upgrades: We are making an effort to make Service Mesh easy on Google Cloud. This article highlights how CSM makes that possible.

This Week in GKE

12,736 followers

+ Subscribe

David Espejo

Solutions @ Union

4mo

Great to see Flyte in the GKE AI Labs!

2 Reactions

See more comments

To view or add a comment, sign in

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

LLM-D, Supercharged HPA and GKE AI Labs

Abdel SGHIOUAR

Senior Cloud Developer Advocate | Podcaster | Speaker | CNCF Ambassador | Kubestronaut | Human

Recommended by LinkedIn

This Week in GKE

12,736 followers

More articles by Abdel SGHIOUAR

Sign in

Others also viewed

Fueling the Enterprise AI Revolution: Key Takeaways from Red Hat Summit 2025

Content Paradigm

The Interface-Infrastructure Flywheel

Scaling AI: Why You Need to Rethink Your Infrastructure Strategy

HPE Alletra MP Deep Dive, LLaMa in the Lab, vSAN Crushing, More...

Live from IBM Think 2025: Agentic AI, Quantum Computing, and the Future of Enterprise Tech

Sovereign AI - Edition 15 – 4th of May 2025

AI’s Latest: Explore Here.

AI storage - MinIO AIStor and Pure Storage FlashBlade //EXA

AI Dominated KubeCon EU 2025. Here’s What It Means for Infra Teams.

Explore content categories

Recommended by LinkedIn

This Week in GKE

12,736 followers

More articles by Abdel SGHIOUAR

GKE Upcoming pricing change. Hackathon and scalable HPA

Cluster-Wide default Compute Class, HPA Perf and chat-gpt OSS

Happy Birthday GKE 🎂, COCP and Kubernetes 1.34

GKE Backups, Managed Lustre and gcloud run compose up

COC GA for Autopilot, Volume Populator and A2A

Kubernetes 1.33 and Docker-In-Docker on GKE

Long Time No See!

GCR Deprecation, L4 loadbalancers and KubeCon EU’25

HPA and Startup Latency improvements, Service Mesh and Next’25

GKE in 🇸🇪, Recommendations and Certification discounts

Sign in

Others also viewed

Fueling the Enterprise AI Revolution: Key Takeaways from Red Hat Summit 2025

Content Paradigm

The Interface-Infrastructure Flywheel

Scaling AI: Why You Need to Rethink Your Infrastructure Strategy

HPE Alletra MP Deep Dive, LLaMa in the Lab, vSAN Crushing, More...

Live from IBM Think 2025: Agentic AI, Quantum Computing, and the Future of Enterprise Tech

Sovereign AI - Edition 15 – 4th of May 2025

AI’s Latest: Explore Here.

AI storage - MinIO AIStor and Pure Storage FlashBlade //EXA

AI Dominated KubeCon EU 2025. Here’s What It Means for Infra Teams.

Explore content categories