IBM Research uses vLLM at the heart of its Research Inference & Tuning Service (RITS) Platform, providing shared access to model inference and tuning across its research community. Our recent case study outlines how RITS supports more than 1,300 active users and hosts over 100 models at any given time, with vLLM serving as the core model serving runtime. https://lnkd.in/gVjCnBDm The Futurum Group covered the case study and examines what this approach could mean for enterprise AI infrastructure, including centralized access, governance, and cost control. 🔗 Read here: https://lnkd.in/gknJ3YZM #PyTorch #vLLM #AIInfrastructure #OpenSourceAI
PyTorch
Research Services
San Francisco, California 318,111 followers
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
About us
An open source machine learning framework that accelerates the path from research prototyping to production deployment. PyTorch is an open source project at the Linux Foundation.
- Website
-
http://www.pytorch.org
External link for PyTorch
- Industry
- Research Services
- Company size
- 501-1,000 employees
- Headquarters
- San Francisco, California
- Type
- Public Company
- Specialties
- Artificial Intelligence, Deep Learning, Machine Learning, and AI
Locations
-
Primary
Get directions
548 Market St
San Francisco, California, US
Employees at PyTorch
Updates
-
LightSeek Foundation recently released Shepherd Model Gateway (SMG). It came out of a production bottleneck in LLM serving: CPU-bound work sitting on the critical path. Light Seek moved all non-GPU work into a Rust-based gateway, with a minimal gRPC boundary around inference. The project is developed and maintained by LightSeek Foundation. The result: up to 3.5× throughput in long-context scenarios. 🔗 More details in our latest blog: https://lnkd.in/gARQCjqn #PyTorch #LightSeek #OpenSourceAI #ShepherdModelGateway
-
-
Help us shape the most technical program in the PyTorch ecosystem! 🎤 We are seeking proposals for PyTorch Conference North America, taking place October 20-21 in San Jose, CA. Whether you are working on massive-scale training or kernel-level optimizations, we want to hear from you. 🗓️ Deadline to submit: June 7 🔗 Submit your proposal: https://bit.ly/4bIgqbs #PyTorchCon #PyTorch #PyTorchFoundation #FutureOfAI #AI #GenAI #MachineLearning #ML #DeepLearning #OpenSource #OpenSourceSoftware #OpenSourceDevelopment #OpenSourceCommunity #OSS #LinuxFoundation #events #linux #CallForProposals #CallForPapers #CFP #CallForSpeakers
-
Only 3 days remain to join our speaker lineup in Shanghai! 🎤 We are looking for session presentations, panels, and lightning talks for KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China. If you have been working on technical breakthroughs or real-world use cases, the community wants to hear from you. The portal closes in 72 hours. Get your proposal in today: https://bit.ly/40QIjb2 And don't forget early bird rates end in 1 week! Save now: https://bit.ly/3NG8hv1 #KubeCon #CloudNativeCon #OpenInfraSummit #PyTorchCon #Kubernetes #K8s #CloudNative #OpenInfra #PyTorch #PyTorchFoundation #FutureOfAI #AI #GenAI #MachineLearning #ML #DeepLearning #OpenSource #OpenSourceSoftware #OpenSourceDevelopment #OpenSourceCommunity #OSS #CNCF #CloudNativeComputingFoundation #events #linux #software #CallForProposals #CallForPapers #CFP #CallForSpeakers
-
PyTorch reposted this
🎉 vLLM v0.20.0 is out — 752 commits from 320 contributors, 123 of them new to the project. A few things worth calling out beyond the changelog: 🚀 A platform baseline shift, not just another release. v0.20.0 moves to CUDA 13.0.2 / PyTorch 2.11 / Python 3.14 / Transformers v5 — a deliberately breaking change. The reason: PyTorch 2.11 is the first release to ship a default aarch64 CUDA wheel on PyPI. That means GB200 / Grace-Blackwell users now get working CUDA from a plain `pip install torch` — no `--extra-index-url`, no nightly, no custom build. If you're standing up a GB200 or GB300 box for inference, this is the version cliff to be aware of. 🙏 Thanks to the PyTorch team for shipping it. ⚙️ Engine & performance work: ⚡ FA4 re-enabled as the default MLA prefill backend (head-dim 512 + paged-KV on SM90+) 🗜️ TurboQuant 2-bit KV cache — 4× capacity, with FA3/FA4 prefill support 🧩 Model Runner V2: full CUDA graph for Eagle prefill, multiple prompt logprobs, stale-token accuracy fix 🛠️ Initial vLLM IR skeleton + rms_norm op — foundation for future kernel work 🧪 Online quantization frontend (FP8 / MXFP8 consolidated) 📈 Batch-invariant fused RMS norm: 2.1% E2E latency improvement 🖥️ Hardware breadth this release: 🟢 NVIDIA: Jetson Thor (sm_110), MXFP4 W4A4 CUTLASS MoE on SM100, TRTLLM-GEN NVFP4 MoE 🔴 AMD: Zen CPU via zentorch, MORI EP, AITER MLA + Eagle3, RDNA 3.5/4 device IDs 🔵 Intel: XPU with MXFP8/MXFP4 quant and FP8 KV cache 🆕 New & updated models: DeepSeek V4, Hunyuan v3 preview, Granite 4.1 Vision, EXAONE-4.5, Phi-4-reasoning-vision. 🐋 DeepSeek V4 specifically — v0.20.0 rolls in everything that's landed since the Day-0 docker image: 🔥 MegaMoE — DeepGEMM MegaMoE integration, a mega kernel from DeepSeek. Opt in with `--moe-backend deep_gemm_mega_moe` (Blackwell only). 🔧 MTP > 1 crash on Hopper — fixed 🔧 Indexer top_k correctness fix 🔧 Shared-experts SwiGLU clipping — fixes an accuracy regression on reasoning requests 🔧 Engine hang on long-context inputs — fixed 🔧 Chat template now renders tool arguments correctly If you've been running V4 on the Day-0 image, this is the upgrade to take. 📖 DeepSeek V4 recipe: https://lnkd.in/gR_jJesr 📝 Full release notes: https://lnkd.in/gKMqNs9d Huge thanks to the 320 contributors who shipped this release. 🙌
-
-
Need to cut LLM training checkpoint costs? Training LLMs requires periodic checkpoints, full snapshots of model weights, optimizer states, and gradients saved to storage so training can resume after interruptions. At scale, these checkpoints become massive. NVIDIA nvCOMP is a GPU-accelerated lossless compression library that compresses the checkpoint before it leaves GPU memory, no roundtrip – no extra data movement. Developers can easily integrate high-throughput compression directly into their Python workflows (such as PyTorch or TensorFlow). 🔗 Read the full post: https://lnkd.in/gsgFqH_x #PyTorch #OpenSourceAI #AI #Inference #Innovation
-
Want to train LLMs on longer contexts without re-engineering your entire systems stack? Introducing AutoSP — the first compiler-based solution that automatically optimizes LLM training for long contexts. Under the hood, AutoSP applies a series of compiler passes that trigger sequence parallelism, paired with a curated activation-checkpointing scheme tailored for long-context training. It's integrated directly into DeepSpeed, so enabling long-context training is just a config change away. No more rewiring your stack to push context lengths. Read the blog to learn more 🖇️ https://lnkd.in/gu46-dey ✍ Ahan Gupta, Zhihao W., Neel Dani, Masahiro Tanaka, Tunji Ruwase, Minjia Zhang #PyTorch #DeepSpeed #AutoSP #OpenSourceAI
-
-
PyTorch reposted this
💡 CUDA-enabled Pytorch powers millions of developers each month to GPU accelerate their deep learning and simulation with the most industrial-grade solution available. NVIDIA CUDA-X brings cutting edge accelerated computing to your training, inference, and RL flows with libraries such as cuBLAS, cuDNN and CUTLASS as the backbone to GPU-enabled PyTorch. Learn more ➡️ https://nvda.ws/42bZowK
-
PyTorch reposted this
My latest podcast is an interview from the recent PyTorch event in Paris. In it I speak with Mark Collier, executive director of the PyTorch Foundation and announcements from the event. If you use LLMs and AI tools, the chances are the model it uses were trained with PyTorch, so this is foundational stuff! Enjoy below or search for "Tech Lounge" wherever you get podcasts. https://lnkd.in/d-upHkSp
-
The countdown has begun: One week left to submit your talk! ⏳ The Call for Proposals for KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China, 8-9 September in Shanghai, closes on 3 May. This is your opportunity to share your expertise with a global audience across tracks spanning developer experience, ML systems, infrastructure innovation, and open source leadership. Don't miss the chance to help shape the program for this massive cross-community event. Submit by 3 May: https://bit.ly/40QIjb2 Not planning to speak? Register by 6 May + save! https://bit.ly/3NG8hv1 #KubeCon #CloudNativeCon #OpenInfraSummit #PyTorchCon #Kubernetes #K8s #CloudNative #OpenInfra #PyTorch #PyTorchFoundation #FutureOfAI #AI #GenAI #MachineLearning #ML #DeepLearning #OpenSource #OpenSourceSoftware #OpenSourceDevelopment #OpenSourceCommunity #OSS #CNCF #CloudNativeComputingFoundation #events #linux #software #CallForProposals #CallForPapers #CFP #CallForSpeakers