-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
#20324
opened Jul 1, 2025 by
tlrmchlsmth
Loading…
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models
#20322
opened Jul 1, 2025 by
huaqiangwang
Loading…
3 of 4 tasks
[USAGE] Improve error handling for weight initialization in Unquantized…
documentation
Improvements or additions to documentation
v1
#20321
opened Jul 1, 2025 by
koiker
Loading…
3 of 4 tasks
HF Hub LoRA Resolver
ci/build
documentation
Improvements or additions to documentation
#20320
opened Jul 1, 2025 by
alex-jw-brooks
Loading…
fix(cpu): Handle None ModelConfig during default initialization
#20319
opened Jul 1, 2025 by
Flink-ddd
Loading…
fix[Docs]: link anchor is incorrect #20309
documentation
Improvements or additions to documentation
structured-output
#20315
opened Jul 1, 2025 by
yyzxw
Loading…
4 tasks
[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel
#20308
opened Jul 1, 2025 by
jvlunteren
Loading…
Add support for Prithvi geospatial model in serving mode
documentation
Improvements or additions to documentation
frontend
multi-modality
Related to multi-modality (#4194)
needs-rebase
structured-output
v1
[doc] quark_mxfp4_introduction
documentation
Improvements or additions to documentation
#20306
opened Jul 1, 2025 by
lihaoyang-amd
•
Draft
[Feature] Support Minimax-M1 function calls features
documentation
Improvements or additions to documentation
frontend
tool-calling
#20297
opened Jul 1, 2025 by
qscqesze
Loading…
[Hardware][RISC-V] Add RISC-V architecture cpu inference support
ci/build
#20292
opened Jul 1, 2025 by
huangzhengx
Loading…
[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct
#20286
opened Jun 30, 2025 by
zichongli5
Loading…
3 of 4 tasks
[Misc][Doc] Add missing comment for LLM
frontend
#20285
opened Jun 30, 2025 by
draftbk
Loading…
1 of 4 tasks
Support DeepSeekV3-style block FP8 quantization with CT
quantization
#20279
opened Jun 30, 2025 by
mgoin
Loading…
[TPU] Temporary fix vmem oom for long model len by reducing page size
ready
ONLY add when PR is ready to merge/full CI is needed
tpu
Related to Google TPUs
v1
#20278
opened Jun 30, 2025 by
Chenyaaang
Loading…
[Docs] use Improvements or additions to documentation
uv
in GPU installation docs
documentation
#20277
opened Jun 30, 2025 by
davidxia
Loading…
[Bugfix][Frontend]: Fix API server connection refused on wsl2
frontend
#20275
opened Jun 30, 2025 by
Chen-zexi
Loading…
3 of 4 tasks
[Bugfix] Fix None value handling in trace span creation for cancelled requests
#20272
opened Jun 30, 2025 by
br4mm
Loading…
3 of 4 tasks
[Refactor] Refactor import utils
frontend
multi-modality
Related to multi-modality (#4194)
performance
Performance-related issues
speculative-decoding
structured-output
tool-calling
v1
#20269
opened Jun 30, 2025 by
yewentao256
Loading…
[Benchmark] Add benchmark tool for multi turn conversations
performance
Performance-related issues
#20267
opened Jun 30, 2025 by
pliops-daniels
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.