A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Nov 3, 2025 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Prebuilt DeepSpeed wheels for Windows with NVIDIA GPU support. Supports GTX 10 - RTX 50 series. Compiled with pytorch 2.7, 2.8 and cuda 12.8
RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
📦 A fully automated method for installing Nvidia drivers on Arch Linux
Repository for Campbells-Luggs-Blackwells family history web site
GEN3C: Generative Novel 3D Captions - Adapted for NVIDIA Blackwell GPU architecture (sm_120). Includes automatic GPU detection, CPU-based T5 text encoding for Blackwell compatibility, and full backward compatibility with older GPUs.
Add a description, image, and links to the blackwell topic page so that developers can more easily learn about it.
To associate your repository with the blackwell topic, visit your repo's landing page and select "manage topics."