apex报错

apex 报错

当使用pytorch_transformers时,遇到报错如下
ModuleNotFoundError: No module named ‘fused_layer_norm_cuda’

解决方式

手动编译安装apex

1
2
3
git clone https://github.com/NVIDIA/apex
cd apex
CUDA_HOME=/usr/local/cuda-11.2 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

安装时对不同的cuda版本的需求会导致apex报错

1
2
3
4
5
6
7
8
9
    Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/media/backup/john/project/apex/setup.py", line 177, in <module>
check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)
File "/media/backup/john/project/apex/setup.py", line 34, in check_cuda_torch_binary_vs_bare_metal
raise RuntimeError(
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 11.3.
In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk).
ERROR: Command errored out with exit status 1: /home/anaconda3/envs/py38/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/media//john/project/apex/setup.py'"'"'; __file__='"'"'/media//john/project/apex/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-rw1s_hs_/install-record.txt --single-version-externally-managed --compile --install-headers /home//anaconda3/envs/py38/include/python3.8/apex Check the logs for full command output.

解决方式,注销掉检查cuda版本的语句即可,例如根据提示,注销掉setup.py的177行,检查cuda版本的句子, 重新编译安装即可


apex报错
https://johnson7788.github.io/2022/04/12/apex%E6%8A%A5%E9%94%99/
作者
Johnson
发布于
2022年4月12日
许可协议