Skip to content

Conversation

@vaelev
Copy link
Contributor

@vaelev vaelev commented Jul 27, 2025

Summary

Fixes #3782

This PR adds support for vLLM server mode to OnlineDPOTrainer, matching the functionality available in GRPO. Previously, OnlineDPO only supported "colocate" mode where vLLM runs in the same process. Now users can use an external vLLM server (launched with trl vllm-serve) for generation while keeping the training process separate.

Key Changes

  • Configuration: Added vLLM server mode parameters to OnlineDPOConfig (vllm_mode, vllm_server_base_url, vllm_server_host, vllm_server_port, etc.)
  • VLLMClient Integration: Added server communication support using the existing VLLMClient class
  • Weight Synchronization: Implemented model weight sync between training process and vLLM server via _move_model_to_vllm
  • Dual Mode Support: Refactored generation logic to support both "server" and "colocate" modes cleanly
  • Tests: Added comprehensive test coverage for server mode functionality

Implementation Details

The implementation follows the same patterns as GRPO's vLLM integration to ensure consistency across the TRL library:

  1. Server Mode: Uses VLLMClient to communicate with external vLLM server, synchronizes weights before generation
  2. Colocate Mode: Maintains existing behavior with in-process vLLM LLM instance
  3. Error Handling: Graceful fallback and proper error messages for configuration issues

Test Plan

  • Basic import and configuration tests pass
  • Server mode initialization works correctly
  • Weight synchronization functionality implemented
  • Generation logic handles both modes properly
  • Existing colocate mode behavior preserved
  • Code formatting and linting passes
vaelev and others added 12 commits July 27, 2025 23:47
This PR adds support for vLLM server mode to OnlineDPOTrainer, matching the functionality available in GRPO. Previously, OnlineDPO only supported "colocate" mode where vLLM runs in the same process. Now users can use an external vLLM server (launched with `trl vllm-serve`) for generation while keeping the training process separate.

Key changes:
- Add vLLM server mode configuration parameters to OnlineDPOConfig
- Implement VLLMClient integration for server communication
- Add weight synchronization between training process and vLLM server
- Support both server and colocate modes with proper error handling
- Add comprehensive tests for server mode functionality
- Handle gracefully the current vLLM colocate mode API limitations

The implementation follows the same patterns as GRPO's vLLM integration to ensure consistency across the TRL library.
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ltgm!

@qgallouedec qgallouedec changed the title Add vLLM server mode and VLM support to OnlineDPOTrainer Sep 5, 2025
@qgallouedec qgallouedec merged commit d1bf560 into huggingface:main Sep 5, 2025
2 of 10 checks passed
SamY724 pushed a commit to SamY724/trl that referenced this pull request Sep 6, 2025
…ace#3783)

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants