-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
根据正文的教程,如何使用Llama模型我执行命令后遇到了deepspeed无法正确收集安装的报错,如果强制升级又会遇到其他依赖冲突报错的
git clone https://github.com/LlamaFamily/Llama-Chinese.git
cd Llama-Chinese
pip install -r requirements.txt
我最开始是遇到了deepspeed报错,尝试换了好几个版本无法解决,后来使用pipi升级最新版本发现按照没有错误了,但是又遇到了高版本的deepspeed与依赖pytorch2.1.2冲突的报错,随后我对pytorch升级又遇到了其他的冲突报错。。。。
我升级了某个依赖后,运行提示我PyTorch版本与CUDA不兼容,告诉我缺少flash-attn依赖项。。。还有 libiomp5md.dll 报错
我于是逐步开始解决,关于OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.错误解决方法
看这个:https://zhuanlan.zhihu.com/p/371649016
后续弄了一下午人都整麻了,说说我过程。
1、我虚拟环境重新删除了,重新创建一个310的环境包,安装了pytorch最新版本2.6,我显卡cuda是12.6,因此我装的是pytorch也是选择12.6的cuda,我上一次是装了cuda11.8的,出现pytorch提示和我电脑的cuda不兼容的情况,我原来的conda是3.9我是强制升级到3.10不知道这样会不会影响pytorch但是我后面又把整个虚拟环境都删除了重装的cuda11.8的pytorch,没有什么用。
2、把依赖包的全部手动安装,我这一步是为了检验到底是哪个插件冲突报错,最后我全部装的不带版本号,直接都装了最新的版本(这里有个坑)
*transformers的版本和python版本挂钩,请见:https://pypi.org/project/transformers/4.49.0/#history
bitsandbytes与transformers的版本挂钩
3、安装flash-attn看这个:https://blog.csdn.net/MurphyStar/article/details/138523803
*flash-attn与python版本,cuda版本,pytorch版本都挂钩,注意选择合适的安装包
flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
例如我选择就是flash_attn-2.7.4的版本,cudacu124,(没有最新的12.6选择低一版本的),cp310的python版本,win系统
4.依赖解决之后运行脚本,会出现
The load_in_4bitandload_in_8bitarguments are deprecated and will be removed in the future versions. Please, pass aBitsAndBytesConfigobject inquantization_configargument instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please useattn_implementation="flash_attention_2"instead. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [02:14<00:00, 44.91s/it] The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_maskto obtain reliable results. Theseen_tokensattribute is deprecated and will be removed in v4.41. Use thecache_position model input instead. Traceback (most recent call last): File "P:\Docker\text-generation-webui\models\quick_startAtom.py", line 23, in <module> generate_ids = model.generate(**generate_input) File "C:\Users\User\.conda\envs\py310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "C:\Users\User\.conda\envs\py310\lib\site-packages\transformers\generation\utils.py", line 2223, in generate result = self._sample( File "C:\Users\User\.conda\envs\py310\lib\site-packages\transformers\generation\utils.py", line 3204, in _sample model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) File "C:\Users\User\.cache\huggingface\modules\transformers_modules\Atom-7B-Chat\model_atom.py", line 1380, in prepare_inputs_for_generation max_cache_length = past_key_values.get_max_length() File "C:\Users\User\.conda\envs\py310\lib\site-packages\torch\nn\modules\module.py", line 1928, in __getattr__ raise AttributeError( AttributeError: 'DynamicCache' object has no attribute 'get_max_length'. Did you mean: 'get_seq_length'?
这是因为2中我装的最新版本的transformers,4.49的冲突,起先我装回依赖里的4.23,之后遇到了bitsandbytes报错,我又装回了bitsandbytes依赖指定的0.42,但是transformers和bitsandbytes是不冲突了,但是bitsandbytes和我的pytorch、cuda版本又冲突了,它无法识别我12.6的cuda
我无法降级安装,只能又装回了最新的版本。
最后解决的方法按照提示修改了FlagAlpha\Atom-7B-Chat\model_atom.py文件,把1380行max_cache_length = past_key_values.get_max_length()
的get_max_length换成了get_seq_length,保存成功运行了。
5.最后还是有一些警告,不知道要不要解决,如果你有什么好的建议可以回复我
The load_in_4bit and load_in_8bit arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig object in quantization_config argument instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [02:01<00:00, 40.43s/it]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.
Human: 介绍一下中国
Assistant: 中华人民共和国是中国的一个国家,位于亚洲东部、太平洋西岸。它是世界上人口最多的发展中大国之一,也是全 球第二大经济体和国家元首会议的常任成员国之一。中国的历史悠久,文化丰富多彩,是世界上最古老的文明之一的发源地之一。此外,它也是一个多民族的国家,拥有多种不同的语言和文化传统。