⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer #3783

vaelev · 2025-07-27T21:55:28Z

Summary

This PR adds support for vLLM server mode to OnlineDPOTrainer, matching the functionality available in GRPO. Previously, OnlineDPO only supported "colocate" mode where vLLM runs in the same process. Now users can use an external vLLM server (launched with trl vllm-serve) for generation while keeping the training process separate.

Key Changes

Configuration: Added vLLM server mode parameters to OnlineDPOConfig (vllm_mode, vllm_server_base_url, vllm_server_host, vllm_server_port, etc.)
VLLMClient Integration: Added server communication support using the existing VLLMClient class
Weight Synchronization: Implemented model weight sync between training process and vLLM server via _move_model_to_vllm
Dual Mode Support: Refactored generation logic to support both "server" and "colocate" modes cleanly
Tests: Added comprehensive test coverage for server mode functionality

Implementation Details

The implementation follows the same patterns as GRPO's vLLM integration to ensure consistency across the TRL library:

Server Mode: Uses VLLMClient to communicate with external vLLM server, synchronizes weights before generation
Colocate Mode: Maintains existing behavior with in-process vLLM LLM instance
Error Handling: Graceful fallback and proper error messages for configuration issues

Test Plan

Basic import and configuration tests pass
Server mode initialization works correctly
Weight synchronization functionality implemented
Generation logic handles both modes properly
Existing colocate mode behavior preserved
Code formatting and linting passes

This PR adds support for vLLM server mode to OnlineDPOTrainer, matching the functionality available in GRPO. Previously, OnlineDPO only supported "colocate" mode where vLLM runs in the same process. Now users can use an external vLLM server (launched with `trl vllm-serve`) for generation while keeping the training process separate. Key changes: - Add vLLM server mode configuration parameters to OnlineDPOConfig - Implement VLLMClient integration for server communication - Add weight synchronization between training process and vLLM server - Support both server and colocate modes with proper error handling - Add comprehensive tests for server mode functionality - Handle gracefully the current vLLM colocate mode API limitations The implementation follows the same patterns as GRPO's vLLM integration to ensure consistency across the TRL library.

trl/trainer/online_dpo_config.py

trl/trainer/online_dpo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

…r-mode

…m-server-mode

examples/scripts/online_dpo_vlm.py

…r-mode

qgallouedec

ltgm!

…ace#3783) Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

vaelev and others added 12 commits July 27, 2025 23:47

Merge branch 'main' into online-dpo-vllm-server-mode

11a08fe

Merge branch 'main' into online-dpo-vllm-server-mode

14880e0

update vllm sampling and use_transformers_paged

5ed6eb5

add require_torch_accelerator decorator

0fc45f9

remove vllm_tensor_parallel_size

0f9d6c4

Merge branch 'main' into online-dpo-vllm-server-mode

bcc59c1

add support for vlms

3e795cc

add test

a454992

fix test

4764161

fix processing

76c0119

add vllm colocate environ

72f07ac

sergiopaniego reviewed Aug 7, 2025

View reviewed changes

trl/trainer/online_dpo_config.py Outdated Show resolved Hide resolved

Update trl/trainer/online_dpo_config.py

e3deba7

sergiopaniego mentioned this pull request Aug 7, 2025

🦦 Validate vllm_mode param in GRPO #3866

Merged

5 tasks

kashif and others added 11 commits August 8, 2025 10:35

Merge branch 'main' into online-dpo-vllm-server-mode

5896894

Code quality check removing not needed change

8e21ff6

add fixes for different vlms

75cc071

add online_dpo_vlm.py script

b6d9db4

add rewardFunc

f86b797

Handling data parallel models

2894e61

distributed_executor_backend=external_launcher always for vllm

04ba071

model to self.model

29bfc40

fix test

dae384b

use key "prompt"

c36e66d

Working vllm server

206a1ad

qgallouedec reviewed Aug 9, 2025

View reviewed changes

trl/trainer/online_dpo_trainer.py Show resolved Hide resolved

qgallouedec reviewed Aug 9, 2025

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Aug 9, 2025

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

Update trl/trainer/online_dpo_trainer.py

5b7b279

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

kashif and others added 20 commits August 18, 2025 14:37

add deprecation warning

23cfbfc

Merge branch 'main' into online-dpo-vllm-server-mode

3fe9e64

Merge branch 'main' into online-dpo-vllm-server-mode

9a762ea

add dep to the XPO and nash trainer

d969f0e

Merge branch 'main' into online-dpo-vllm-server-mode

7a98d3e

Merge branch 'main' into online-dpo-vllm-server-mode

7eb7046

Merge remote-tracking branch 'origin/main' into online-dpo-vllm-serve…

766707e

…r-mode

Adapted script for HF jobs support

fdee1ea

Merge branch 'main' into online-dpo-vllm-server-mode

f9b816a

Updated script with kernels

8b4a7ac

Merge remote-tracking branch 'origin/main' into online-dpo-vllm-serve…

2471179

…r-mode

fix test argument

405ee6d

Merge branch 'main' into online-dpo-vllm-server-mode

e599953

align grpo rloo

cc97d39

style

7702091

deprected dataset_num_proc

5f546f4

fixes

55832d1

fix typo

a9d6b96

Merge branch 'main' of github.com:huggingface/trl into online-dpo-vll…

2d06791

…m-server-mode

torch_dtype to dtype

6c0b2fb

kashif reviewed Sep 4, 2025

View reviewed changes

examples/scripts/online_dpo_vlm.py Outdated Show resolved Hide resolved

kashif and others added 3 commits September 4, 2025 12:43

Update examples/scripts/online_dpo_vlm.py

11a0f1b

Merge remote-tracking branch 'origin/main' into online-dpo-vllm-serve…

cc46791

…r-mode

Merge branch 'main' into online-dpo-vllm-server-mode

bc9b1c5

qgallouedec approved these changes Sep 5, 2025

View reviewed changes

qgallouedec changed the title ~~Add vLLM server mode and VLM support to OnlineDPOTrainer~~ Sep 5, 2025

qgallouedec merged commit d1bf560 into huggingface:main Sep 5, 2025
2 of 10 checks passed

qgallouedec mentioned this pull request Sep 22, 2025

Add Custom Reward Functions To Online DPO (and other methods) #2767

Closed

albertvillanova mentioned this pull request Sep 30, 2025

Fix docstrings with 'deprecated' Sphinx directive #4174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer #3783

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer #3783

Uh oh!

vaelev commented Jul 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Labels

5 participants

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer #3783

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer #3783

Uh oh!

Conversation

vaelev commented Jul 27, 2025

Summary

Key Changes

Implementation Details

Test Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants