Add Idefics3 support (Granite Docling VLM) by gaztrabisme · Pull Request #4090 · unslothai/unsloth

gaztrabisme · 2026-02-22T17:38:14Z

Adds Idefics3 architecture support to FastVisionModel, enabling fine-tuning of ibm-granite/granite-docling-258M and other Idefics3-based models.

Changes

unsloth/models/vision.py — 1 file, 39 insertions:

Add "idefics3" to VLLM_SUPPORTED_VLM
Add _fix_requires_grad_hooks_for_kwargs() — replaces requires_grad_pre_hook on modules after registration to handle kwargs-only forward signatures (Idefics3's vision encoder passes all args via kwargs → empty positional args tuple → RuntimeError: Failed to make input require gradients)

Test results

ibm-granite/granite-docling-258M (257.5M params), RTX 5080 16GB:

SFT — unsloth/LaTeX_OCR dataset (68K samples), LoRA r=32 + DoRA:

Loss:  3.11 → 1.29 (30 steps)
VRAM:  3.7 GB peak (24%)
Time:  107s

GRPO — TRL GRPOTrainer, text prompts, custom reward:

Loss:  -0.12 → -0.27 (3 steps, policy gradient)
Grad:  0.66 → 1.45 (non-zero, flowing)

Pipeline: load → LoRA/DoRA → forward → backward → train → generate → save — all pass.

The hook fix is a no-op for models with non-empty positional args (all existing VLMs).

Closes unslothai#4079 - Add "idefics3" to VLLM_SUPPORTED_VLM - Add _fix_requires_grad_hooks_for_kwargs() to handle models whose forward() passes all arguments via kwargs (empty positional args tuple causes requires_grad_pre_hook to raise RuntimeError) Tested with ibm-granite/granite-docling-258M (258M params): - SFT on unsloth/LaTeX_OCR: loss 3.1 → 1.3, 30 steps, 3.7GB VRAM - GRPO with TRL GRPOTrainer: full RL loop verified - LoRA/DoRA + gradient checkpointing + generation all pass

gemini-code-assist · 2026-02-22T17:38:29Z

Summary of Changes

Hello @gaztrabisme, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for Idefics3 Vision-Language Models (VLMs) within the framework, specifically enabling fine-tuning of models like ibm-granite/granite-docling-258M. The core change involves extending the list of supported VLMs and implementing a crucial fix for gradient tracking in models that utilize kwargs-only forward signatures, preventing runtime errors during training.

Highlights

Idefics3 VLM Support: Added "idefics3" to the list of supported Vision-Language Models (VLLM_SUPPORTED_VLM), enabling compatibility with Idefics3-based models like ibm-granite/granite-docling-258M.
Gradient Hook Fix for Kwargs: Introduced _fix_requires_grad_hooks_for_kwargs() to address a RuntimeError when models (like Idefics3's vision encoder) pass all arguments via kwargs, ensuring proper gradient tracking by replacing existing requires_grad_pre_hook with a safe version.
Integration into PEFT Model Loading: Integrated the new gradient hook fix into the get_peft_model function, ensuring it is applied automatically when loading and preparing models for fine-tuning.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/models/vision.py
- Added "idefics3" to the VLLM_SUPPORTED_VLM list.
- Implemented the _fix_requires_grad_hooks_for_kwargs function to handle gradient tracking for models with kwargs-only forward signatures.
- Called _fix_requires_grad_hooks_for_kwargs within the get_peft_model function to apply the fix during model preparation.

Activity

The author has provided detailed test results for ibm-granite/granite-docling-258M covering SFT and GRPO training, demonstrating successful loss reduction, VRAM usage, and training times.
The pipeline for loading, LoRA/DoRA, forward/backward passes, training, generation, and saving has been verified to pass.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request adds support for the Idefics3 architecture (e.g., Granite Docling VLM) to FastVisionModel. It includes Idefics3 in the supported VLM list and introduces a fix for requires_grad pre-hooks to handle models that pass all arguments via keyword arguments, which results in empty positional argument tuples. The fix is implemented via _fix_requires_grad_hooks_for_kwargs, which replaces problematic hooks with a safer version. The implementation is generally sound, but there are opportunities to improve the robustness of the hook logic and remove redundant code.

gemini-code-assist · 2026-02-22T17:40:28Z

unsloth/models/vision.py

+    def _safe_requires_grad_pre_hook(module, input):
+        type_input = type(input)
+        if type_input is torch.Tensor:
+            input.requires_grad_(True)
+        elif type_input is tuple or type_input is list:
+            if len(input) == 0:
+                return
+            if torch.is_floating_point(input[0]):
+                input[0].requires_grad_(True)
+


The _safe_requires_grad_pre_hook implementation can be made more robust. Specifically, torch.is_floating_point(input[0]) will raise a TypeError if input[0] is not a torch.Tensor (e.g., if it is None or a raw integer). Additionally, the check for type_input is torch.Tensor is likely redundant as input in a module's forward_pre_hook is typically a tuple, but it's good to handle both cases safely. The pass at line 156 is also unnecessary.

def _safe_requires_grad_pre_hook(module, input): if isinstance(input, (tuple, list)): if len(input) > 0 and isinstance(input[0], torch.Tensor) and torch.is_floating_point(input[0]): input[0].requires_grad_(True) elif isinstance(input, torch.Tensor) and torch.is_floating_point(input): input.requires_grad_(True)

The type(x) is Y pattern and trailing pass match the existing codebase style (used throughout vision.py and unsloth_zoo). The upstream requires_grad_pre_hook also calls torch.is_floating_point(input[0]) without an isinstance guard — input in _forward_pre_hooks is always a tuple of tensors.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96e4d87889

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-22T17:42:54Z

unsloth/models/vision.py

+            if len(input) == 0:
+                return


Mark kwargs tensors for checkpointed forwards

On single-process training paths post_patch_model keeps use_reentrant=True, so checkpointed layers still require at least one grad-enabled input tensor to propagate parameter gradients. This hook now returns immediately when positional input is empty, but kwargs-only layers (the exact Idefics3 case) then never get any tensor switched to requires_grad=True; the crash is avoided, but LoRA parameters inside those checkpointed blocks can silently receive no gradients. Please handle kwargs tensors (or force non-reentrant checkpointing here) instead of early-returning on empty positional args.

Useful? React with 👍 / 👎.

This does not apply here. For Idefics3, requires_grad_for_gradient_checkpointing finds model.text_model has get_input_embeddings and registers a post hook on embed_tokens — the requires_grad_pre_hook fallback is never used:

PRE hooks with requires_grad: 0 POST hooks: embed_tokens -> requires_grad_post_hook

Our replacement is defensive safety for models that might hit the fallback path with kwargs-only forward. The actual gradient flow is unaffected — confirmed by SFT loss 3.11->1.29 and GRPO grad_norm 0.66->1.45.

Datta0 · 2026-02-23T12:30:03Z

I think @mmathew23 handled something like this where args are passed via kwargs which causes trouble with gradients.
Perhaps for some other VLM (maybe Qwen3VL with Siglip?).

@gaztrabisme can you check if you can find that and use the same thing so that we have consistent fix and not duplicate.

gaztrabisme · 2026-02-23T12:47:15Z

@Datta0 Thanks for the pointer. I looked through mmathew23's contributions — the closest match is 6ab47cf33962 in unsloth_zoo/compiler.py ("Fix gradient checkpointing layer caller kwargs"), which adds strip_kw_from_module_calls() to handle kwargs at the compiler/checkpointing-wrapper level.

Our issue is at a different layer — the requires_grad_pre_hook in unsloth_zoo/peft_utils.py (line 212-225) raises RuntimeError when positional args are empty. Idefics3's vision encoder forwards everything via kwargs, so the hook sees an empty tuple and crashes. The original code actually has a commented-out graceful return for this exact case:

if len(input) == 0:
    raise RuntimeError("Unsloth: Failed to make input require gradients\!")
    # print(f"  WARNING: Empty list input to {module.__class__.__name__}\!")
    # return

Our fix replaces that hook post-registration to do the return instead of the raise. It's a no-op for all existing VLMs (they all have non-empty positional args).

Is there a specific commit or PR I might have missed? Happy to align if there's an existing pattern for this.

Datta0 · 2026-02-23T14:15:35Z

Yeah I think you already found the right commit. I'd let @mmathew23 comment more on this :)

mmathew23 · 2026-02-24T04:45:24Z

@gaztrabisme do you have sample sft and grpo scripts to test this?

It might be best to make a PR to update https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_zoo/peft_utils.py#L162-L325, rather than redefine most of the hook in vision.py.

gaztrabisme · 2026-02-24T15:53:17Z

Thanks @mmathew23! Good suggestion — fixing it at the source in peft_utils.py makes more sense than working around it in vision.py.

Test scripts (SFT + GRPO) are here: https://gist.github.com/gaztrabisme/d5c5f405146e679eae67b6ec1ee2e9eb

SFT — unsloth/LaTeX_OCR dataset, 30 steps, loss 3.11→1.29, 3.7GB VRAM, 107s.

GRPO — 20 steps with GRPOTrainer, reward computation + KL tracking + policy updates all work, 2.4GB VRAM, ~13min.

Both use FastVisionModel.from_pretrained("ibm-granite/granite-docling-258M") with LoRA rank 32 + DoRA.

I can put together a PR to unsloth-zoo that handles the empty-args case in requires_grad_pre_hook — basically uncommenting the return path that's already sketched out there. Then this PR simplifies to just adding "idefics3" to VLLM_SUPPORTED_VLM.

danielhanchen

Tested on 1x NVIDIA B200 (CUDA 12.8, torch 2.9.1, transformers 4.57.6, vLLM 0.15.1) with 7 vision notebooks:

Notebook	Result
Qwen3_VL_8B_Vision	PASS
Qwen3_VL_8B_Vision_GRPO	PASS
Gemma3_4B_Vision	PASS
Gemma3_4B_Vision_GRPO	PASS
Gemma3N_4B_Vision	PASS
Deepseek_OCR_3B	FAIL (pre-existing: easydict import blocked by unsloth_zoo patched_import)
Qwen2_5_7B_VL_GRPO	FAIL (pre-existing: vLLM multimodal tokenizer bug)

5/7 pass. Both failures are pre-existing env issues unrelated to this PR.

Code review notes:

As mmathew23 mentioned, the _fix_requires_grad_hooks_for_kwargs fix could be a 1-line change in unsloth_zoo/peft_utils.py rather than a 35-line workaround in vision.py. The upstream fix would benefit all models, not just Idefics3.
The VLLM_SUPPORTED_VLM addition is correct and the error message list should also include Idefics3 (currently only listed in the supported set but not in the user-facing error text).
Extra blank lines at the end of the hook function (style nit).

Functionally works for all tested vision models.

Moved the requires_grad_pre_hook fix upstream to unsloth-zoo#514. This PR now only adds "idefics3" to VLLM_SUPPORTED_VLM and the error message text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gaztrabisme · 2026-02-24T19:32:54Z

Thanks for the thorough testing @danielhanchen! All three points addressed:

Hook workaround removed — moved the fix upstream to Fix requires_grad_pre_hook for kwargs-only forward methods unsloth-zoo#514. This PR is now just "idefics3" in the VLM list.
Error message updated — Idefics3 added to the user-facing text.
Extra blank lines cleaned up — gone with the workaround removal.

PR is now 2 lines changed (1 file).

gaztrabisme · 2026-02-27T09:05:35Z

@mmathew23 Could you help review this and my PR over at unsloth-zoo? Many thanks!

gaztrabisme requested review from danielhanchen and mmathew23 as code owners February 22, 2026 17:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

96e4d87

for more information, see https://pre-commit.ci

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 22, 2026

View reviewed changes

danielhanchen approved these changes Feb 24, 2026

View reviewed changes

gaztrabisme mentioned this pull request Feb 24, 2026

Fix requires_grad_pre_hook for kwargs-only forward methods unslothai/unsloth-zoo#514

Open

Simplify: remove hook workaround, keep idefics3 in VLM list

630e60a

Moved the requires_grad_pre_hook fix upstream to unsloth-zoo#514. This PR now only adds "idefics3" to VLLM_SUPPORTED_VLM and the error message text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Idefics3 support (Granite Docling VLM)#4090

Add Idefics3 support (Granite Docling VLM)#4090
gaztrabisme wants to merge 3 commits intounslothai:mainfrom
gaztrabisme:feat/idefics3-support

gaztrabisme commented Feb 22, 2026

gemini-code-assist bot commented Feb 22, 2026

gemini-code-assist bot left a comment

gemini-code-assist bot Feb 22, 2026

gaztrabisme Feb 22, 2026

chatgpt-codex-connector bot left a comment

chatgpt-codex-connector bot Feb 22, 2026

gaztrabisme Feb 22, 2026

Datta0 commented Feb 23, 2026

gaztrabisme commented Feb 23, 2026

Datta0 commented Feb 23, 2026

mmathew23 commented Feb 24, 2026

gaztrabisme commented Feb 24, 2026

danielhanchen left a comment

gaztrabisme commented Feb 24, 2026

gaztrabisme commented Feb 27, 2026

Labels

4 participants

Uh oh!

Conversation

gaztrabisme commented Feb 22, 2026

Changes

Test results

gemini-code-assist bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

gaztrabisme Feb 22, 2026

Choose a reason for hiding this comment

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

chatgpt-codex-connector bot Feb 22, 2026

Choose a reason for hiding this comment

gaztrabisme Feb 22, 2026

Choose a reason for hiding this comment

Datta0 commented Feb 23, 2026

gaztrabisme commented Feb 23, 2026

Datta0 commented Feb 23, 2026

mmathew23 commented Feb 24, 2026

gaztrabisme commented Feb 24, 2026

danielhanchen left a comment

Choose a reason for hiding this comment

gaztrabisme commented Feb 24, 2026

gaztrabisme commented Feb 27, 2026

Labels

4 participants