-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Check model inputs - hidden states #40994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check model inputs - hidden states #40994
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @zucchini-nlp . Are you still working on this PR? Please let me know when it's ready, I'd like to verify it. Thanks! |
|
It should be working with VLMs, just need to fix CI and code style. I'll take it after finishing one huge PR I'm working on currently |
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant_false(self): | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified that prev and when model was released, it never used vision features after the norm. So the test was prob re-activated in the meanwhile when we had incorrect hidden_states from vision tower
|
I hate rebasing, looks like new models were added to |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I would just like to change the name of the arg, i.e. I don't really understand what is post_ln_hiddens. Let's try to find something more clear, and let's also document what it is in the decorator definition!
Feel free to merge afterwards! 🤗
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but yeah let's rename the arg its not explicit!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, albert, apertus, arcee, aria, audio_spectrogram_transformer, aya_vision, bert, bert_generation, bitnet, blip, blip_2, blt, camembert, cohere, cohere2 |
* update all models * fix copies * skip aria tests * update other models * skip should be in test, not tester * i think this is more descriptive as a name * find and replace for new models
* update all models * fix copies * skip aria tests * update other models * skip should be in test, not tester * i think this is more descriptive as a name * find and replace for new models
What does this PR do?
Fixes #40964
In most vision models the
output.hidden_statesare the hiddens right after encoder blocks, i.e. before the last layernorm. Therefore for these modelsoutput.hidden_states != output.last_hidden_stateCurrently
check_model_inputsassumes that last hidden state is the correct one to return which is true for language models only. This PR adds a kwarg forcheck_model_inputswhich decides whether to replace last hidden state or notTBH, i think the way it is done in LMs is the ultimate correct version and we probably need to "break" vision models. But I can't think of a way to obtain pre-norm last hidden states which are needed for some VLMs