背景
为了在本地愉快流畅地体验stable diffsion,且不希望直接在windows中安装过多复杂的环境,顺便体验容器的部署和发布的便利,决定选择stable diffusion的docker版(AbdBarho版)。
网上已经有很多stable diffusion的部署文章,有很多甚至是零基础或者一键安装的,但自己尝试之后才发现还是有很多坑,真正操作起来并没有那么容易。这些坑的来源主要是两个方面:一个是网络下载问题,另一个是docker中的GPU运行问题。
写这篇文章的主要目的并非详细介绍环境构建的步骤,而是记录部署过程中踩过的坑,以及相关的解决方案。如果你已经尝试了docker版部署并遇到了问题,可以做个参考。
系统需求
网上很多教程都是在Linux上部署的,但是有一点要明确:虚拟机上的Linux是不行的。
因为如果要在虚拟机上使用GPU,就意味着GPU必须虚拟化或者透传给虚拟机使用,就像CPU和内存的虚拟化一样,但GPU并不支持虚拟化(也许有,但你总不会为了部署个软件去改bios吧),所以即使你在虚拟机中安装了GPU驱动也无法使用GPU。
其次,由于docker只能在linux上运行,就意味着你必须使用wsl,在wsl里再安装docker。
(也可以安装docker-desktop,但似乎坑比较多,不如直接在wsl中直接安装docker-ce)。
需要安装的软件
windows上
Nvidia显卡驱动
https://www.nvidia.com/en-us/geforce/drivers/
wsl2
wsl --install
Ubuntu 22.04
wsl --install -d Ubuntu-22.04
git
(地址略)
Ubuntu上
针对wsl2的cuda驱动
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local
docker
curl https://get.docker.com | sh
stable diffusion
git clone https://github.com/AbdBarho/stable-diffusion-webui-docker.git
另外,nvidia-docker和docker-compose个人理解是不用安装的。
前者是docker的一个插件,用来自动给docker run增加gpu使用的参数,而最新版的docker在yml中已经支持gpu参数了,且在实际部署的过程中并没有用到过nvidia-docker命令。而后者docker也有自带的docker compose命令,并不需要使用docker-compose命令。
下载问题的解决方案
github上官方的安装命令非常简单,两个compose命令就可以自动下载安装包生成镜像并运行:
sudo service docker start
cd /mnt/d/yourpath/stable-diffusion-webui-docker
sudo docker compose --profile download up --build
sudo docker compose --profile auto up
但在国内执行时遇到的最多的问题就是下载中断。这时候如果重试仍然下载不了,就需要手动修改dockerfile文件来解决。
docker compose profile download下载错误
webui-docker-download-1 | edc1d3|OK | 1.7MiB/s|/data/VAE/vae-ft-mse-840000-ema-pruned.ckpt
webui-docker-download-1 | e85b41|OK | 99KiB/s|/data/RealESRGAN/RealESRGAN_x4plus_anime_6B.pth
webui-docker-download-1 | 011c41|OK | 2.4KiB/s|/data/LDSR/project.yaml
webui-docker-download-1 | d56dbc|OK | 1.6MiB/s|/data/LDSR/model.ckpt
webui-docker-download-1 | 53c224|OK | 2.0MiB/s|/data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-download-1 | a573f8|OK | 1.9MiB/s|/data/StableDiffusion/sd-v1-5-inpainting.ckpt
webui-docker-download-1 | 9521b5|ERR | 28KiB/s|/data/RealESRGAN/RealESRGAN_x4plus.pth
webui-docker-download-1 | 3a3491|ERR | 87KiB/s|/data/GFPGAN/GFPGANv1.4.pth
一般重新执行一次就可以(会续传)
github依赖的下载失败问题
如果碰到类似下面的github下载超时的问题:
#0 120.9 fatal: unable to access 'https://github.com/crowsonkb/k-diffusion.git/': HTTP/2 stream 1 was
not closed cleanly before end of the underlying stream
有一些github依赖无论重试多少次都会失败,可以把dockerfile(stable-diffusion-webui-docker\services\AUTOMATIC1111\Dockerfile)中的github手动改为kgithub即可成功下载。
例如:
RUN . /clone.sh k-diffusion https://kgithub.com/crowsonkb/k-diffusion.git 5b3af030dd83e0297272d861c19477735d0317ec
RUN . /clone.sh clip-interrogator https://kgithub.com/pharmapsychotic/clip-interrogator 2486589f24165c8e3b303f84e9dbbea318df83e8
#0 0.910 04/13 13:43:47 [ERROR] CUID#7 - Download aborted. URI=https://kgithub.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl
#0 0.910 Exception: [AbstractCommand.cc:351] errorCode=1 URI=https://kgithub.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl
#0 0.910 -> [SocketCore.cc:1018] errorCode=1 SSL/TLS handshake failure: `not signed by known authorities or invalid' `expired'
手动下载libgoogle-perftools-dev_2.7-1_amd64.deb和xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl(改名为wheel.whl),注释掉aria2c下载命令,改为使用COPY命令拷贝到容器可解决:
#RUN aria2c -x 5 --dir / --out wheel.whl 'https://github.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl'
COPY wheel.whl /
COPY libgoogle-perftools-dev_2.7-1_amd64.deb /
#0 141.8 pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
#0 63.88 Downloading gradio-3.15.0-py3-none-any.whl (13.8 MB)
#0 141.8 ╸ 0.2/13.8 MB 3.2 kB/s eta 1:10:40
设置pip全局镜像源可解决:
pip config --global set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip config --global set install.trusted-host mirrors.aliyun.com
pip install -r requirements_versions.txt
#0 0.451 fatal: unable to access 'https://kgithub.com/AUTOMATIC1111/stable-diffusion-webui.git/': server certificate verification failed. CAfile: none CRLfile: none
关闭证书验证可解决:
git config --global http.version HTTP/1.1
git config --global http.sslverify false
git clone https://kgithub.com/AUTOMATIC1111/stable-diffusion-webui.git
failed to solve: process "/bin/bash -ceuxo pipefail apt-get -y install libgoogle-perftools-dev && apt-get clean" did not complete successfully: exit code: 100
向 /etc/apt/sources.list 文件中添加镜像可解决:
RUN echo 'deb http://ftp.cn.debian.org/debian buster main' | tee -a /etc/apt/sources.list
RUN apt-get -y install libgoogle-perftools-dev && apt-get clean
容器中无法使用GPU的问题
容器启动失败,提示找不到GPU驱动:
webui-docker-auto-1 | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
在容器中执行nvidia-smi提示下面信息:
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
搜索发现这个是微软wsl2自身的问题:
https://github.com/microsoft/WSL/issues/9962
解决方案就是下载最新的wsl1.2.3版本并安装:
https://github.com/microsoft/WSL/releases/tag/1.2.3
wsl --shutdown
$Package = Get-AppxPackage MicrosoftCorporationII.WindowsSubsystemforLinux -AllUsers
Remove-AppxPackage $Package -AllUsers
Add-AppxPackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle
其它问题
docker启动时有如下错误:
mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
该错误是wsl 1.1.6.0的bug(但似乎并不影响docker启动)。升级到1.2.3可解决。