当前位置:AIGC资讯 > AIGC > 正文

【AI】RTX2060 6G Ubuntu 22.04.1 LTS (Jammy Jellyfish) 部署Chinese-LLaMA-Alpaca-2 【2】启用GPU支持

接上篇

【AI】RTX2060 6G Ubuntu 22.04.1 LTS (Jammy Jellyfish) 部署Chinese-LLaMA-Alpaca-2-CSDN博客

前面的实验,chat.sh确认是运行在CPU模式下,未启用GPU支持

重新编译llama.cpp

sudo apt install nvidia-cuda-toolkit
cd ~/Downloads/ai/llama.cpp
make clean
make LLAMA_CUBLAS=1 -j6

故障:nvcc fatal   : Value 'native' is not defined for option 'gpu-architecture'

查看gpu-arch

修改Makefile,调整MK_NVCCFLAGS差异如下 

yeqiang@yeqiang-MS-7B23:~/Downloads/ai/llama.cpp$ diff -Npr Makefile Makefile.bak 

*** Makefile	2024-01-18 09:27:20.833657229 +0800
--- Makefile.bak	2024-01-18 09:26:19.792301482 +0800
*************** endif #LLAMA_CUDA_NVCC
*** 380,386 ****
  ifdef CUDA_DOCKER_ARCH
  	MK_NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
  else ifndef CUDA_POWER_ARCH
! 	MK_NVCCFLAGS += -arch=compute_87
  endif # CUDA_DOCKER_ARCH
  ifdef LLAMA_CUDA_FORCE_DMMV
  	MK_NVCCFLAGS += -DGGML_CUDA_FORCE_DMMV
--- 380,386 ----
  ifdef CUDA_DOCKER_ARCH
  	MK_NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
  else ifndef CUDA_POWER_ARCH
! 	MK_NVCCFLAGS += -arch=native
  endif # CUDA_DOCKER_ARCH
  ifdef LLAMA_CUDA_FORCE_DMMV
  	MK_NVCCFLAGS += -DGGML_CUDA_FORCE_DMMV

重新编译

make clean
make LLAMA_CUBLAS=1 -j6

启动

cd ~/Downloads/ai/Chinese-LLaMA-Alpaca-2/
ln -s -f ~/Downloads/ai/llama.cpp/main .
bash scripts/llama-cpp/chat.sh models/chinese-alpaca-2-1.3b/ggml-model-q4_0.bin

报错

 [/INST]CUDA error: no kernel image is available for execution on the device
  current device: 0, in function ggml_cuda_op_mul_mat at ggml-cuda.cu:9089
  cudaGetLastError()
GGML_ASSERT: ggml-cuda.cu:231: !"CUDA error"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: 不允许的操作.
No stack.
 

root再次执行,报错

CUDA error: no kernel image is available for execution on the device
  current device: 0, in function ggml_cuda_op_mul_mat at ggml-cuda.cu:9089
  cudaGetLastError()
GGML_ASSERT: ggml-cuda.cu:231: !"CUDA error"
[New LWP 10733]
[New LWP 10734]
[New LWP 10735]
[New LWP 10743]
[New LWP 10744]
[New LWP 10745]
[New LWP 10746]
[New LWP 10747]
[New LWP 10748]
[New LWP 10749]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f873baea42f in __GI___wait4 (pid=10750, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30    ../sysdeps/unix/sysv/linux/wait4.c: 没有那个文件或目录.
#0  0x00007f873baea42f in __GI___wait4 (pid=10750, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30    in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000055f91c7cd15b in ggml_print_backtrace ()
#2  0x000055f91c8a0b9d in ggml_cuda_error(char const*, char const*, char const*, int, char const*) [clone .constprop.0] ()
#3  0x000055f91c8c2e5c in ggml_cuda_op_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), bool) ()
#4  0x000055f91c8c3ae5 in ggml_cuda_mul_mat(ggml_tensor const*, ggml_tensor const*, ggml_tensor*) ()
#5  0x000055f91c8c4f67 in ggml_cuda_compute_forward ()
#6  0x000055f91c7faabe in ggml_graph_compute_thread ()
#7  0x000055f91c7feb45 in ggml_graph_compute ()
#8  0x000055f91c8c8c6e in ggml_backend_cpu_graph_compute ()
#9  0x000055f91c8cce07 in ggml_backend_sched_graph_compute ()
#10 0x000055f91c81d68b in llama_decode_internal(llama_context&, llama_batch) ()
#11 0x000055f91c81e0a3 in llama_decode ()
#12 0x000055f91c7bf8cf in main ()
[Inferior 1 (process 10729) detached]
 

驱动与cuda不匹配?

下载安装最新cuda(只升级cuda,驱动暂时不升级)

wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.runsudo sh cuda_12.3.2_545.23.08_linux.run

sudo sh cuda_12.3.2_545.23.08_linux.run

配置ld.so.conf

sudo echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf
sudo ldconfig

编译llama.cpp

export PATH=/usr/local/cuda/bin/:$PATH
make clean
make LLAMA_CUBLAS=1 -j6

再次启动chat.sh

cd ~/Downloads/ai/Chinese-LLaMA-Alpaca-2/
ln -s -f ~/Downloads/ai/llama.cpp/main .
sudo bash scripts/llama-cpp/chat.sh models/chinese-alpaca-2-1.3b/ggml-model-q4_0.bin

重启到mult-user.target

sudo systemctl set-default multi-user.target
reboot

解决:CUDA error: no kernel image is available for execution on the device

修改Makefile MK_NVCCFLAGS += -arch=compute_75 差异如下

*** Makefile	2024-01-18 14:35:08.120485566 +0800
--- Makefile.bak	2024-01-18 09:26:19.792301482 +0800
*************** endif #LLAMA_CUDA_NVCC
*** 380,386 ****
  ifdef CUDA_DOCKER_ARCH
  	MK_NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
  else ifndef CUDA_POWER_ARCH
! 	MK_NVCCFLAGS += -arch=compute_75
  endif # CUDA_DOCKER_ARCH
  ifdef LLAMA_CUDA_FORCE_DMMV
  	MK_NVCCFLAGS += -DGGML_CUDA_FORCE_DMMV
--- 380,386 ----
  ifdef CUDA_DOCKER_ARCH
  	MK_NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
  else ifndef CUDA_POWER_ARCH
! 	MK_NVCCFLAGS += -arch=native
  endif # CUDA_DOCKER_ARCH
  ifdef LLAMA_CUDA_FORCE_DMMV
  	MK_NVCCFLAGS += -DGGML_CUDA_FORCE_DMMV

重新编译。

愿意参考资料

--------------------

yeqiang@yeqiang-MS-7B23:~/Downloads/ai/Chinese-LLaMA-Alpaca-2����−−����−���−���ℎ�������50�������52�������53�������60�������61�������62�������70�������72�������75�������80�������86�������87�������89�������90�������@�������−��−7�23: /���������/��/�ℎ�����−�����−������−2nvcc−−list−gpu−archcompute5​0compute5​2compute5​3compute6​0compute6​1compute6​2compute7​0compute7​2compute7​5compute8​0compute8​6compute8​7compute8​9compute9​0yeqiang@yeqiang−MS−7B23: /Downloads/ai/Chinese−LLaMA−Alpaca−2 cd /Downloads/ai/llama.cpp/ yeqiang@yeqiang-MS-7B23:/Downloads/ai/llama.cpp$ cat Makefile| grep compute_ MK_NVCCFLAGS += -arch=compute_87

从您的信息中可以看出:

您的NVIDIA GPU支持多种CUDA架构,其中包括与RTX 2060相匹配的compute_75架构。

~/Downloads/ai/llama.cpp目录下的Makefile文件中,编译器标志(MK_NVCCFLAGS)指定了使用-arch=compute_87作为目标架构进行编译。

由于RTX 2060显卡对应的架构是compute_75,而Makefile中的设置针对的是更高版本的架构compute_87,这可能导致编译出的内核代码无法在RTX 2060上运行,从而出现“no kernel image is available for execution on the device”的错误。

解决这个问题的方法是修改Makefile,将-arch=compute_87更改为与RTX 2060兼容的架构,即-arch=compute_75。然后重新执行编译命令以生成适用于RTX 2060的内核代码。具体操作如下:

 

Bash

# 打开Makefile文件进行编辑
nano ~/Downloads/ai/llama.cpp/Makefile

# 将 -arch=compute_87 修改为 -arch=compute_75
# (请确保正确找到并替换相应行)
...
MK_NVCCFLAGS += -arch=compute_75
...

# 保存并关闭文件
CTRL+X, Y, Enter

# 然后重新执行make命令编译项目
make clean
make

这样,编译后的程序应该可以成功在您的NVIDIA GeForce RTX 2060显卡上运行了。

不同显卡对应的arch参数,请列表显示

NVIDIA显卡的arch参数(在CUDA编译器nvcc中使用)对应的是GPU的计算能力版本,也称为SM架构。这里列出部分常见NVIDIA显卡系列与它们对应的CUDA计算能力(compute capability):

Kepler架构:

compute_30: GeForce GTX 600系列、Tesla K10/K20等 compute_35: GeForce GTX 700M系列、Quadro Kxxx系列等

Maxwell架构:

compute_50: GeForce GTX 900系列、TITAN X (Maxwell)等 compute_52: GeForce GTX 900M系列等移动版显卡

Pascal架构:

compute_60: GeForce GTX 10xx系列、TITAN Xp, Tesla P100等 compute_61: Quadro GP100, Tesla P40等 compute_62: GeForce GTX 1050/1050 Ti等

Volta架构:

compute_70: Tesla V100

Turing架构:

compute_75: GeForce RTX 20系列、TITAN RTX、Quadro RTX系列等 compute_70: Tesla T4 (虽然基于Turing架构,但其计算能力为7.5)

Ampere架构:

compute_80: A100 GPU、RTX 30系列的一部分型号 compute_86: RTX 3070、RTX 3080、RTX 3090等桌面显卡 compute_87: 针对某些特定场景优化的架构版本 compute_89: 可能用于专业或数据中心级产品

Ampere Next/Lovelace架构:

compute_90: RTX 40系列的部分型号

请根据您的具体显卡型号选择相应的arch参数来编译CUDA程序以获得最佳性能和兼容性。由于NVIDIA持续发布新的GPU架构,上述列表可能不全面,建议查阅最新的CUDA Toolkit文档获取最新信息。

---------------------

搞定,但是CPU使用率超出了预期

CPU使用率高,GPU使用率基本没动,可以看到main的显存使用率

期望的是CPU基本没负载,GPU全力工作,以后再慢慢研究。

总结:

1. Chinese-LLaMA-Alpaca-2这个项目从目前的使用上看,仅仅用到了里面的chat.sh脚本,这个应该是个训练项目,本地显卡太次,用不上。python依赖安装浪费了网络和时间

2. 网上下载训练好的模型+编译llama.cpp才是主要工作。

参考:

【AI实战】llama.cpp量化cuBLAS编译;nvcc fatal:Value ‘native‘ is not defined for option ‘gpu-architecture‘_nvcc fatal : value 'native' is not defined for opt-CSDN博客

更新时间 2024-02-12