本地搭建LLaMA-Factory环境进行大模型调优

LLaMA Factory

LLaMA Factory快速对大模型进行快速调优，本文看一下如何本地搭建环境并调优，本文使用 ModelScope 社区中的模型，模型在国内，下载速度非常友好。

下载最新代码

## LLaMA Factory官方
git pull https://github.com/hiyouga/LLaMA-Factory

编译 Docker 并运行
我添加了USE_MODELSCOPE_HUB=1，代表从 ModelScope 拉模型，所以训练时候需要使用 ModelScope 的 ID。docker 文件稍微修改一下，添加 python 镜像，否则打包非常慢。

FROM nvcr.io/nvidia/pytorch:24.01-py3

WORKDIR /app

COPY requirements.txt /app/
RUN pip install -i https://mirrors.aliyun.com/pypi/simple -r requirements.txt

COPY . /app/
RUN pip install -i https://mirrors.aliyun.com/pypi/simple -e .[metrics,bitsandbytes,qwen]

VOLUME [ "/root/.cache/huggingface/", "/app/data", "/app/output" ]
EXPOSE 7860

CMD [ "llamafactory-cli", "webui" ]

docker build -f ./Dockerfile -t llama-factory:latest .
docker run --runtime=nvidia --gpus all \
-v ./hf_cache:/root/.cache/huggingface/ \
-v ./data:/app/data \
-v ./examples:/app/examples \
-v ./output:/app/output \
-e CUDA_VISIBLE_DEVICES=0 \
-e USE_MODELSCOPE_HUB=1 \
-p 7860:7860 \
--shm-size 32G \
--name llama_factory \
-d llama-factory:latest

模型训练
测试中使用了 llam3模型，修改配置文件的模型 ID，examples/lora_single_gpu/llama3_lora_sft.yaml，从“model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct” 改为 “model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct”

CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml

模型推理
使用训练好的模型进行推理，修改模型 ID，从“model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct” 改为 “model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct”

CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

导出模型
将Lora 调优的adapter 于原始模型进行合并导出。

CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

转换为 GGUF
将导出的模型转换为 GGUF，GGUF 可以通过 ollama 运行。

## 下载 llamafile

git clone https://github.com/ggerganov/llama.cpp.git

## 安装依赖
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r llama.cpp/requirements.txt

##转换模型
 python llama.cpp/convert-hf-to-gguf.py /app/models/llama3_lora_sft/   --outfile test-llama3.gguf   --outtype q8_0

导入 Ollama 创建模型文件，创建模型文件testmodel。

FROM ./test-llama3.gguf
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>"

创建模型文件

ollama create testllama3 -f testmodel

执行文件

 ollama run  testllama3

总结

本文只是简单对 LLaMA Factory 在本地调优的流程进行了简单的介绍，调优完成之后将模型到处为 GGUF 格式并用 ollama 运行，具体的调优参数还要参考 LLaMA Factory 官方网站，不得不吐槽一下，文档确实不太完善，得看源代码。