操作系统:Linux Ubuntu 22.04 x86_64
仿真环境:Webots R2025a
显卡型号:NVIDIA GeForce RTX 5060 Ti
CPU版本:2.0.1
wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Bcpu.zipunzip libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu.zip
下载libtorch 2.0.1 cpu版本,并解压到指定目录。
GPU版本:2.9.1(地址:https://download.pytorch.org/libtorch/cu130)
wget https://download.pytorch.org/libtorch/cu130/libtorch-shared-with-deps-2.9.1%2Bcu130.zipunzip libtorch-shared-with-deps-2.9.1+cu130.zip
下载libtorch 2.9.1 gpu版本,并解压到指定目录。
CUDA版本13.0
wget https://developer.download.nvidia.com/compute/cuda/13.0.1/local_installers/cuda-repo-ubuntu2204-13-0-local_13.0.1-580.82.07-1_amd64.debsudo dpkg -i cuda-repo-ubuntu2204-13-0-local_13.0.1-580.82.07-1_amd64.debsudo cp /var/cuda-repo-ubuntu2204-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/sudo apt updatesudo apt install -y cuda-toolkit-13-0
配置环境变量:
echo 'export PATH=/usr/local/cuda-13.0/bin:$PATH' >> ~/.bashrcecho 'export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH' >> ~/.bashrcecho 'export CUDACXX=/usr/local/cuda-13.0/bin/nvcc' >> ~/.bashrcsource ~/.bashrc
验证安装:
nvcc --version# 必须显示:release 13.0which nvcc# 必须输出:/usr/local/cuda-13.0/bin/nvcc
异常问题排查:
如果一直报如下报错:
terminate called after throwing an instance of 'c10::AcceleratorError'what(): CUDA error: unknown errorSearch for `cudaErrorUnknown' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9c (0x791d6cd8fbcc in /home/ubuntu/kai/robot-sim/output/lib/libc10.so)frame #1: <unknown function> + 0x18ef9 (0x791d6cc30ef9 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #2: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xce (0x791d6cc7e20e in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #3: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4c (0x791d6cc7e38c in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x65 (0x791d6cc7f165 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #5: c10::cuda::SetDevice(signed char, bool) + 0xba (0x791d6cc802aa in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #6: c10::cuda::set_device(signed char, bool) + 0x15 (0x791d6cc805d5 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
尝试使用如下方法:
sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm或sudo modprobe -r nvidia_uvm && sudo modprobe nvidia_uvm
如果报如下报错:
terminate called after throwing an instance of 'c10::AcceleratorError'what(): CUDA error: no CUDA-capable device is detectedSearch for `cudaErrorNoDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9c (0x70c592d8fbcc in /home/ubuntu/kai/robot-sim/output/lib/libc10.so)frame #1: <unknown function> + 0x18ef9 (0x70c594676ef9 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #2: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xce (0x70c5946c420e in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #3: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4c (0x70c5946c438c in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x65 (0x70c5946c5165 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
是因为CUDA_VISIBLE_DEVICES环境变量不对,删除此环境变量或者重新设置:
export CUDA_VISIBLE_DEVICES=0或unset CUDA_VISIBLE_DEVICES