Skip to content

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Aug 20, 2025

Our new simple docker image mostly intended to be used with trl jobs

Note that the image is already pushed to https://hub.docker.com/r/huggingface/trl-source-gpu (soon rename to just "huggingface/trl")

Comment on lines +4 to +6
push:
branches:
- main
Copy link
Member Author

@qgallouedec qgallouedec Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory, we need would need to build

  • dev:
    • every commit on main
    • every new release of a dependency
  • stable:
    • every patch release
    • every new release of a dependency

however, since we typically make a few commits per day on main, let's simplify things and just build both images when a new commit is made on main.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Training using Jobs guide, I can see that we'd need to update it adding:

Since it's just a small update, I can include it in this PR shortly. If you’d rather merge sooner, feel free, and I’ll open a new PR 😄

@qgallouedec qgallouedec changed the title 🐳 Docker update Sep 14, 2025
@qgallouedec
Copy link
Member Author

I took the opportunity to simplify the jobs documentation:

  • if it was not specific to training/trl, I redirected to HF Jobs doc (env var, hardware, timeout etc)
  • add some warnings about important things like pushing the model and set a proper timeout
  • remove the section on trackio as we now have a dedicated section in the doc for it
  • made the training script part a bit more general (script don't necessarily come from the repo
  • moved up trl-jobs, as it's the simpler solution so I expect user to go further in the doc for further customization (you don't want the user to discover at the end that a simpler solution existed)
  • move the docker part down as it's probably more for power users to choose their own docker image

I'll merge this PR, as it's been opened for a while now, and I want the docker image to be operating. we can still refine in the future if needed


from .. import __version__
from ..import_utils import (
from trl import __version__
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for scripts, we don't want relative imports

@qgallouedec qgallouedec merged commit 9955ee7 into main Sep 14, 2025
13 checks passed
@qgallouedec qgallouedec deleted the docker-update branch September 14, 2025 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants