强化学习训练机器人走路
发布时间:2026-04-23 15:23:17.0 文章来源:AiSoftCloud 浏览次数:57 下载次数:1 

环境准备

操作系统:Linux Ubuntu 22.04 x86_64
仿真环境:Webots R2025a
显卡型号:NVIDIA GeForce RTX 5060 Ti

安装依赖

libtorch

CPU版本:2.0.1

  1. wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Bcpu.zip
  2. unzip libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu.zip

下载libtorch 2.0.1 cpu版本,并解压到指定目录。

GPU版本:2.9.1(地址:https://download.pytorch.org/libtorch/cu130

  1. wget https://download.pytorch.org/libtorch/cu130/libtorch-shared-with-deps-2.9.1%2Bcu130.zip
  2. unzip libtorch-shared-with-deps-2.9.1+cu130.zip

下载libtorch 2.9.1 gpu版本,并解压到指定目录。

安装CUDA

CUDA版本13.0

  1. wget https://developer.download.nvidia.com/compute/cuda/13.0.1/local_installers/cuda-repo-ubuntu2204-13-0-local_13.0.1-580.82.07-1_amd64.deb
  2. sudo dpkg -i cuda-repo-ubuntu2204-13-0-local_13.0.1-580.82.07-1_amd64.deb
  3. sudo cp /var/cuda-repo-ubuntu2204-13-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
  4. sudo apt update
  5. sudo apt install -y cuda-toolkit-13-0

配置环境变量:

  1. echo 'export PATH=/usr/local/cuda-13.0/bin:$PATH' >> ~/.bashrc
  2. echo 'export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
  3. echo 'export CUDACXX=/usr/local/cuda-13.0/bin/nvcc' >> ~/.bashrc
  4. source ~/.bashrc

验证安装:

  1. nvcc --version
  2. # 必须显示:release 13.0
  3. which nvcc
  4. # 必须输出:/usr/local/cuda-13.0/bin/nvcc

异常问题排查:
如果一直报如下报错:

  1. terminate called after throwing an instance of 'c10::AcceleratorError'
  2. what(): CUDA error: unknown error
  3. Search for `cudaErrorUnknown' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
  4. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
  5. For debugging consider passing CUDA_LAUNCH_BLOCKING=1
  6. Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
  7. Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
  8. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9c (0x791d6cd8fbcc in /home/ubuntu/kai/robot-sim/output/lib/libc10.so)
  9. frame #1: <unknown function> + 0x18ef9 (0x791d6cc30ef9 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  10. frame #2: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xce (0x791d6cc7e20e in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  11. frame #3: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4c (0x791d6cc7e38c in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  12. frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x65 (0x791d6cc7f165 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  13. frame #5: c10::cuda::SetDevice(signed char, bool) + 0xba (0x791d6cc802aa in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  14. frame #6: c10::cuda::set_device(signed char, bool) + 0x15 (0x791d6cc805d5 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)

尝试使用如下方法:

  1. sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm
  2. sudo modprobe -r nvidia_uvm && sudo modprobe nvidia_uvm

如果报如下报错:

  1. terminate called after throwing an instance of 'c10::AcceleratorError'
  2. what(): CUDA error: no CUDA-capable device is detected
  3. Search for `cudaErrorNoDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
  4. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
  5. For debugging consider passing CUDA_LAUNCH_BLOCKING=1
  6. Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
  7. Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
  8. frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9c (0x70c592d8fbcc in /home/ubuntu/kai/robot-sim/output/lib/libc10.so)
  9. frame #1: <unknown function> + 0x18ef9 (0x70c594676ef9 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  10. frame #2: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xce (0x70c5946c420e in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  11. frame #3: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4c (0x70c5946c438c in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)
  12. frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x65 (0x70c5946c5165 in /home/ubuntu/kai/robot-sim/output/lib/libc10_cuda.so)

是因为CUDA_VISIBLE_DEVICES环境变量不对,删除此环境变量或者重新设置:

  1. export CUDA_VISIBLE_DEVICES=0
  2. unset CUDA_VISIBLE_DEVICES
更多文章可关注公众号
aisoftcloud