Skip to content

Conversation

@zucchini-nlp
Copy link
Member

What does this PR do?

Fixes #40964

In most vision models the output.hidden_states are the hiddens right after encoder blocks, i.e. before the last layernorm. Therefore for these models output.hidden_states != output.last_hidden_state

Currently check_model_inputs assumes that last hidden state is the correct one to return which is true for language models only. This PR adds a kwarg for check_model_inputs which decides whether to replace last hidden state or not

TBH, i think the way it is done in LMs is the ultimate correct version and we probably need to "break" vision models. But I can't think of a way to obtain pre-norm last hidden states which are needed for some VLMs

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp removed the request for review from ArthurZucker September 19, 2025 10:54
@jiqing-feng
Copy link
Contributor

Hi @zucchini-nlp . Are you still working on this PR? Please let me know when it's ready, I'd like to verify it. Thanks!

@zucchini-nlp
Copy link
Member Author

It should be working with VLMs, just need to fix CI and code style. I'll take it after finishing one huge PR I'm working on currently

Comment on lines 185 to 202
@unittest.skip(
reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm"
)
def test_training_gradient_checkpointing(self):
pass

@unittest.skip(
reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm"
)
def test_training_gradient_checkpointing_use_reentrant(self):
pass

@unittest.skip(
reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm"
)
def test_training_gradient_checkpointing_use_reentrant_false(self):
pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified that prev and when model was released, it never used vision features after the norm. So the test was prob re-activated in the meanwhile when we had incorrect hidden_states from vision tower

@zucchini-nlp
Copy link
Member Author

I hate rebasing, looks like new models were added to check_model_inputs. I will fix them soon

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I would just like to change the name of the arg, i.e. I don't really understand what is post_ln_hiddens. Let's try to find something more clear, and let's also document what it is in the decorator definition!
Feel free to merge afterwards! 🤗

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but yeah let's rename the arg its not explicit!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, albert, apertus, arcee, aria, audio_spectrogram_transformer, aya_vision, bert, bert_generation, bitnet, blip, blip_2, blt, camembert, cohere, cohere2

@zucchini-nlp zucchini-nlp merged commit 9db58ab into huggingface:main Oct 6, 2025
25 checks passed
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* update all models

* fix copies

* skip aria tests

* update other models

* skip should be in test, not tester

* i think this is more descriptive as a name

* find and replace for new models
@hmellor hmellor added the for patch Tag issues / labels that should be included in the next patch label Nov 12, 2025
Cyrilvallez pushed a commit that referenced this pull request Nov 24, 2025
* update all models

* fix copies

* skip aria tests

* update other models

* skip should be in test, not tester

* i think this is more descriptive as a name

* find and replace for new models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

6 participants