REDUCE ONLINE DPO MEMORY CONSUMPTION WITH UNSLOTH QLORA (OPEN SOURCE) Find my open source contributions and details about how you can reduce VRAM usage for ONLINE DPO finetuning below in my substack… | Keith Truongcao

11mo

REDUCE ONLINE DPO MEMORY CONSUMPTION WITH UNSLOTH QLORA (OPEN SOURCE) Find my open source contributions and details about how you can reduce VRAM usage for ONLINE DPO finetuning below in my substack blog! (All code in the blog!) I believe that more official Online RLHF support for Unsloth will be coming out sometime soon. Special thanks to the Unsloth AI team (Daniel Han and Michael Han (Unsloth)), Edward Kim and Costa Huang for helping me make this project come together! blog: https://lnkd.in/eCej2J5q