Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
huggingface
/
trl
Public
generated from
fastai/nbdev_template
Notifications
You must be signed in to change notification settings
Fork
2.4k
Star
16.8k
Code
Issues
539
Pull requests
88
Discussions
Actions
Projects
0
Security
Uh oh!
There was an error while loading.
Please reload this page
.
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights
Commits
Branch selector
main
User selector
All users
Datepicker
All time
Commit History
Commits on Dec 26, 2025
Small fix on contributing docs (#4753)
murilo-cunha
authored
2337cc9
Copy full SHA for 2337cc9
Commits on Dec 24, 2025
docs: fix "Good Second Issue" redirection link (#4749)
casinca
authored
c04fdc0
Copy full SHA for c04fdc0
Change tiny model dtype from float16 to bfloat16 to fix CUDA error (#4745)
albertvillanova
authored
51113d1
Copy full SHA for 51113d1
Commits on Dec 23, 2025
Avoid docstyle formatting for `TestParseResponse` (#4736)
qgallouedec
authored
19021f8
Copy full SHA for 19021f8
Hotfix for browsergym openenv notebook (#4740)
sergiopaniego
authored
5e6ceed
Copy full SHA for 5e6ceed
RLOO supports async rewards. (#4718)
Show description for 044058e
pramodith
and
qgallouedec
authored
044058e
Copy full SHA for 044058e
Add GRPO QLoRA free notebook (#4660)
sergiopaniego
authored
c9a9e25
Copy full SHA for c9a9e25
Add uv/hf jobs support to OpenEnv scripts (#4720)
sergiopaniego
authored
11c9867
Copy full SHA for 11c9867
Include data type for tiny models and update tests (#4728)
qgallouedec
authored
32bf84d
Copy full SHA for 32bf84d
Commits on Dec 22, 2025
Fix: handle multiple tool calls in `qwen3_schema` (#4709)
mattbui
authored
22b386a
Copy full SHA for 22b386a
Commits on Dec 21, 2025
Upgrade GitHub Actions for Node 24 compatibility (#4733)
salmanmkc
authored
490f4c7
Copy full SHA for 490f4c7
Upgrade GitHub Actions to latest versions (#4734)
Show description for f0bce66
salmanmkc
authored
f0bce66
Copy full SHA for f0bce66
Commits on Dec 20, 2025
Remove experimental imports from testing_utils (#4727)
albertvillanova
authored
38e1a35
Copy full SHA for 38e1a35
Commits on Dec 19, 2025
Apply docstyle
qgallouedec
committed
a515085
Copy full SHA for a515085
Refactor vLLM generation [3/N]: Decouple profiling from trainer (#4717)
Show description for 29a39ab
albertvillanova
and
qgallouedec
authored
29a39ab
Copy full SHA for 29a39ab
Fix deprecation version for RLOO max_prompt_length (#4726)
albertvillanova
authored
8611eba
Copy full SHA for 8611eba
Disallow `PeftModel` + `peft_config` in trainers (#4713)
qgallouedec
authored
bb5cd8a
Copy full SHA for bb5cd8a
Update agents notebook dependencies (#4724)
sergiopaniego
authored
20691d0
Copy full SHA for 20691d0
Commits on Dec 18, 2025
Upload FunctionGemma notebook (#4721)
sergiopaniego
authored
1dc8bbc
Copy full SHA for 1dc8bbc
Overwrite model default generation config used by model.generate (#4647)
albertvillanova
authored
8918c98
Copy full SHA for 8918c98
Support async reward functions and parallelize call to reward functions. (#4567)
Show description for 50bc248
pramodith
and
qgallouedec
authored
50bc248
Copy full SHA for 50bc248
Add inference example to GRPO agent training notebook (#4710)
sergiopaniego
authored
13ad537
Copy full SHA for 13ad537
Fix test assertion for `top_k` parameter in `OnlineDPOTrainer` (#4714)
qgallouedec
authored
0ead75b
Copy full SHA for 0ead75b
[docs] Fix RapidFire AI position in documentation (#4715)
qgallouedec
authored
23a941a
Copy full SHA for 23a941a
Fix KeyError with transformers 5.0.0+ where push_to_hub_token is removed (#4691)
Show description for 976a2c0
Manodeepray
and
qgallouedec
authored
976a2c0
Copy full SHA for 976a2c0
docs: Add RapidFire AI cross-references to DPO and GRPO trainer docs (#4705)
Show description for 30e9894
3 people
authored
30e9894
Copy full SHA for 30e9894
Commits on Dec 17, 2025
Align RLOO with GRPO (#4706)
Show description for 157cd63
qgallouedec
and
albertvillanova
authored
157cd63
Copy full SHA for 157cd63
BrowserGym example for LLMs (no vision) (#4696)
Show description for 06a897d
sergiopaniego
and
qgallouedec
authored
06a897d
Copy full SHA for 06a897d
Deprecate max_prompt_length in RLOOTrainer (#4703)
albertvillanova
authored
ca54e24
Copy full SHA for ca54e24
Commits on Dec 16, 2025
Include `generation_config` for tiny model uploads (#4643)
qgallouedec
authored
997368b
Copy full SHA for 997368b
Preserve truncated tokens in BFD packing (#4632)
Show description for ec70ef2
qgallouedec
and
albertvillanova
authored
ec70ef2
Copy full SHA for ec70ef2
Move `get_reward` function to `experimental.utils` (#4683)
qgallouedec
authored
a12faa5
Copy full SHA for a12faa5
Align use of vllm_max_model_length in RLOOTrainer (#4702)
albertvillanova
authored
61c9921
Copy full SHA for 61c9921
Align GRPO and RLOO initialization (#4685)
qgallouedec
authored
00da046
Copy full SHA for 00da046
Move `prepare_model_for_kbit_training`, `enable_gradient_checkpointing`, `prepare_peft_model` to `experimental.utils` (#4704)
qgallouedec
authored
20cc2e1
Copy full SHA for 20cc2e1
Pagination
Previous
Next
You can’t perform that action at this time.