随着各种attention版本的出现,在使用基于transformer的模型时会出现flash-attention和cuda不匹配导致flash-attention报错:

RuntimeError: Failed to import transformers.models.bert.modeling_bert because of the following error (look up to see its traceback):
/data/xxxx/miniconda3/envs/xxx/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

解决方案,从flash-attention(https://github.com/Dao-AILab/flash-attention/releases)官网下载,版本对应的flash-attention报即可使用;

解决步骤:

1.卸载现有的flash-attention: pip uninstall flash-attention

2.查看python,torch cuda,nvcc版本:python -V,  pip show torch, nvidia-smi,  nvcc -V

3. 从官网中下载对应的flash-attention包:

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.1.post1/flash_attn-2.7.1.post1+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl


pip install flash_attn-2.7.1.post1+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

完成!!!

Logo

中国智能体开发者社区,聚焦智能体与大模型开发,提供前沿资讯、实用工具链、开源项目及行业案例。通过技术沙龙、开发者大赛等活动,促进经验交流与协作,助力开发者快速构建创新智能应用。

更多推荐