LLM-D, Supercharged HPA and GKE AI Labs

LLM-D, Supercharged HPA and GKE AI Labs

The News

GKE

  • Introducing llm-d: llm-d is a Kubernetes-native distributed inference serving stack - a well-lit path for anyone to serve large language models at scale. It’s a collaboration between Google, Nvidia, IBM and Redhat aiming at simplifying LLM serving on Kubernetes. The project has multiple efforts you can read about in the blog or check the github repo.
  • Performance HPA profile GA and default in Autopilot: We introduced the new HPA Performance profile back in Nov 2024. Since we made many improvments. This new stack delivers 3x faster autoscaling and improved reliability at scale, supporting 1000 objects within SLO (an increase from 300). It also opens up expansion of HPA capabilities to support native custom metrics Integration, parallel processing, and tolerance handling, as well as future multidimensional capabilities.
  • GKE AI Labs: Is a new one stop shop for everything AI on GKE. We moved all tutorials and guides to this website. You can find code and steps to deploy LLMs but also OSS solutions to GKE.
  • Confidential nodes for GPU Workloads: GKE now supports using Confidential Nodes for GPU workloads. Depending on the version (check the release notes) various VM families are supported.
  • Container Optimized Compute is default: From GKE 1.32.3+ CoC (Container Optimize Compute) is the default Autoscaler stack. CoC is our revamped cluster autoscaler stack with improved Pod scheduling latency.
  • GKE Threat detection in SCC: Container Threat Detection works by triggering findings based on signals extracted from running containers on GKE. There are multiple types of signals like cli execution, malicious code execution…(more details).Now you can find these findings in Security Command Center.
  • vLLM TPU Support is GA: vLLM now supports TPU chips. This guide shows how to deploy Llama 3.1 70b on TPU v6 (Trillium) on GKE Autopilot.
  • [Live] Five Key Google Kubernetes Engine Features You Must Know: Tune-In on June 5th to hear about the five key GKE features you should know from Gari Singh.

The recordings for Google Cloud Next 2025 are available on-demand https://cloud.withgoogle.com/next/25/session-library?filters=vod-recorded-session#all

Article content
https://cloud.withgoogle.com/next/25/session-library?filters=vod-recorded-session#all

AI/ML

The Community


Great to see Flyte in the GKE AI Labs!

To view or add a comment, sign in

More articles by Abdel SGHIOUAR

Others also viewed

Explore content categories