Llama 3 实践教程（InternStudio 版）

文章目录

Llama3本地 Web Demo部署初始环境配置下载模型 Web Demo 部署 XTuner 微调 Llama3 个人小助手自我认知训练数据集准备 XTuner配置文件准备训练模型推理验证 XTuner微调Llama3图片理解多模态准备 Visual Encoder 权重 XTuner 微调 Llama3 图片理解多模态环境配置准备 Image Projector 权重数据准备微调过程训练启动效果体验 Pretrain 模型 Finetune 后模型 1.2 Llama3 的下载 2. LMDeploy Chat CLI 工具 3. LMDeploy模型量化(lite) 3.1 设置最大KV Cache缓存大小 3.2 使用W4A16量化 3.3 在线量化 KV 4. LMDeploy服务（serve） 4.1 启动API服务器 4.2 命令行客户端连接API服务器 4.3 网页客户端连接API服务器拓展部分： 5. 推理速度 6. 使用LMDeploy运行视觉多模态大模型Llava-Llama-3 6.1 安装依赖 6.2 运行模型 Llama 3 Agent 能力体验+微调（Lagent 版） 1. Llama3 ReAct Demo 2. 微调过程 2.1 环境配置 2.2 模型准备 2.3 数据集准备 2.4 微调启动 3. Llama3 ReAct Demo 4. Lagent Web Demo 手把手带你评测 Llama 3 能力（OpenCompass 版）实践教程 oepncompass 评测 **?**环境配置 **✨ 下载 Llama3 模型** **?️** 安装 OpenCompass **? 数据准备** **?️** 命令行快速评测查看配置文件和支持的数据集名称以 C-Eval_gen 为例： **?️** 快速评测 config 快速评测作为新模型支持快速评测

准备工作可以查看 lama3-Tutorial

Llama3本地 Web Demo部署

初始环境配置

环境配置：

conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

Tips: 这里需要一段时间，请耐性等待。

验证安装是否成功

下载模型

新建文件夹

mkdir -p ~/model
cd ~/model

从OpenXLab中获取权重（开发机中不需要使用此步）

安装 git-lfs 依赖

# 如果下面命令报错则使用 apt install git git-lfs -y
conda install git-lfs
git-lfs install

下载模型（InternStudio 中不建议执行这一步）

git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

验证安装是否软连接成功

Web Demo 部署

cd ~
git clone https://github.com/SmartFlowAI/Llama3-Tutorial

以下为执行过程

安装 XTuner 时会自动安装其他依赖

cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .

以下为执行过程

运行 web_demo.py

streamlit run ~/Llama3-Tutorial/tools/internstudio_web_demo.py \
  ~/model/Meta-Llama-3-8B-Instruct

执行过程如下

添加端口8501

成功启动Llama3
![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/6de5c050b4bb4eceb5f3eb1d3d48facd.png

XTuner 微调 Llama3 个人小助手

自我认知训练数据集准备

cd ~/Llama3-Tutorial
python tools/gdata.py

以上脚本在生成了 ~/Llama3-Tutorial/data/personal_assistant.json 数据文件格式如下所示：

[
    {
        "conversation": [
            {
                "system": "你是一个懂中文的小助手",
                "input": "你是（请用中文回答）",
                "output": "您好，我是SmartFlowAI，一个由 SmartFlowAI 打造的人工智能助手，请问有什么可以帮助您的吗？"
            }
        ]
    },
    {
        "conversation": [
            {
                "system": "你是一个懂中文的小助手",
                "input": "你是（请用中文回答）",
                "output": "您好，我是SmartFlowAI，一个由 SmartFlowAI 打造的人工智能助手，请问有什么可以帮助您的吗？"
            }
        ]
    }
]

XTuner配置文件准备

小编为大佬们修改好了configs/assistant/llama3_8b_instruct_qlora_assistant.py 配置文件(主要修改了模型路径和对话模板)请直接享用～

训练模型

cd ~/Llama3-Tutorial

# 开始训练,使用 deepspeed 加速，A100 40G显存 耗时24分钟
xtuner train configs/assistant/llama3_8b_instruct_qlora_assistant.py --work-dir /root/llama3_pth

# Adapter PTH 转 HF 格式
xtuner convert pth_to_hf /root/llama3_pth/llama3_8b_instruct_qlora_assistant.py \
  /root/llama3_pth/iter_500.pth \
  /root/llama3_hf_adapter

# 模型合并
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/model/Meta-Llama-3-8B-Instruct \
  /root/llama3_hf_adapter\
  /root/llama3_hf_merged

以下为训练过程。总体来说，输出过程以训练日志形式输出。

以下为Adapter PTH 转 HF 格式

以下为模型合并

推理验证

streamlit run ~/Llama3-Tutorial/tools/internstudio_web_demo.py \
  /root/llama3_hf_merged

以下为推理过程

XTuner微调Llama3图片理解多模态

准备 Visual Encoder 权重

我们接下来准备 Llava 所需要的 openai/clip-vit-large-patch14-336，权重，即 Visual Encoder 权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/openai/clip-vit-large-patch14-336 .

非 InternStudio

可以访问 https://huggingface.co/openai/clip-vit-large-patch14-336 以进行下载。

XTuner 微调 Llama3 图片理解多模态

环境配置

重新安装 XTuner。这次为完整安装

cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .[all]

准备 Image Projector 权重

然后我们准备 Llava 将要用到的 Image Projector 部分权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/xtuner/llama3-llava-iter_2181.pth .

非 InternStudio

相关权重可以访问：https://huggingface.co/xtuner/llava-llama-3-8b 以及 https://huggingface.co/xtuner/llava-llama-3-8b-v1_1 。（已经过微调，并非 Pretrain 阶段的 Image Projector）

检测权重是否准备完成

数据准备

我们按照 https://github.com/InternLM/Tutorial/blob/camp2/xtuner/llava/xtuner_llava.md 中的教程来准备微调数据。为了让大家可以快速上手，我们选择了使用过拟合的方式快速实现。

可以执行以下代码：

cd ~
git clone https://github.com/InternLM/tutorial -b camp2
python ~/tutorial/xtuner/llava/llava_data/repeat.py \
  -i ~/tutorial/xtuner/llava/llava_data/unique_data.json \
  -o ~/tutorial/xtuner/llava/llava_data/repeated_data.json \
  -n 200

使用上述示例，数据准备过程如下

微调过程

训练启动

我们已经为大家准备好了可以一键启动的配置文件，主要是修改好了模型路径、对话模板以及数据路径。

我们使用如下指令以启动训练：

xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py --work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2

训练过程所需显存约为44447 MiB，在单卡 A100 上训练所需时间为30分钟。

如果是30%的A100资源（24GB显存）或者50%的A100资源（40GB显存），使用如下命令启动训练：

xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py --work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2_offload

训练过程如下

在训练好之后，我们将原始 image projector 和我们微调得到的 image projector 都转换为 HuggingFace 格式，为了下面的效果体验做准备。

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/model/llama3-llava-iter_2181.pth \
  ~/llama3_llava_pth/pretrain_iter_2181_hf

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/llama3_llava_pth/iter_1200.pth \
  ~/llama3_llava_pth/iter_1200_hf

上述过程可以如下

效果体验

在转换完成后，我们就可以在命令行简单体验一下微调后模型的效果了。

问题1：Describe this image.
问题2：What is the equipment in the image?

Pretrain 模型

export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
  --visual-encoder /root/model/clip-vit-large-patch14-336 \
  --llava /root/llama3_llava_pth/pretrain_iter_2181_hf \
  --prompt-template llama3_chat \
  --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

Finetune 后模型

export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
  --visual-encoder /root/model/clip-vit-large-patch14-336 \
  --llava /root/llama3_llava_pth/iter_1200_hf \
  --prompt-template llama3_chat \
  --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/f0d257d13429447e856c109810655724.png)

经过 Finetune 后，我们可以发现，模型已经可以根据图片回答我们的问题了。

# LMDeploy高效部署Llama3实践
## 本文将分为以下几个部分来介绍，如何使用LMDeploy来部署Llama3
- 环境，模型准备
- LMDeploy chat
- LMDeploy模型量化(lite) 
- LMDeploy服务(serve)

## 1. 环境，模型准备
### 1.1 环境配置

```shell
# 如果你是InternStudio 可以直接使用
# studio-conda -t lmdeploy -o pytorch-2.1.2
# 初始化环境
conda create -n lmdeploy python=3.10
conda activate lmdeploy
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

安装lmdeploy最新版。

pip install -U lmdeploy[all]

1.2 Llama3 的下载

新建文件夹

mkdir -p ~/model
cd ~/model

从OpenXLab中获取权重（开发机中不需要使用此步）

安装 git-lfs 依赖

# 如果下面命令报错则使用 apt install git git-lfs -y
conda install git-lfs
git-lfs install

下载模型（InternStudio 中不建议执行这一步）


git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

或者软链接 InternStudio 中的模型

ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct ~/model/Meta-Llama-3-8B-Instruct

上述过程如下

2. LMDeploy Chat CLI 工具

直接在终端运行

conda activate lmdeploy
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct

运行结果是：

3. LMDeploy模型量化(lite)

本部分内容主要介绍如何对模型进行量化。主要包括 KV8量化和W4A16量化。

3.1 设置最大KV Cache缓存大小

模型在运行时，占用的显存可大致分为三部分：模型参数本身占用的显存、KV Cache占用的显存，以及中间运算结果占用的显存。LMDeploy的KV Cache管理器可以通过设置–cache-max-entry-count参数，控制KV缓存占用剩余显存的最大比例。默认的比例为0.8。

下面通过几个例子，来看一下调整--cache-max-entry-count参数的效果。首先保持不加该参数（默认0.8），运行 Llama3-8b 模型。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/

新建一个终端运行

# 如果你是InternStudio 就使用
# studio-smi
nvidia-smi

此时模型的占用为36245M。下面，改变--cache-max-entry-count参数，设为0.5。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.5

新建一个终端运行

# 如果你是InternStudio 就使用
# studio-smi
nvidia-smi

看到显存占用明显降低，变为28701M。

面来一波“极限”，把--cache-max-entry-count参数设置为0.01，约等于禁止KV Cache占用显存。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.01

然后与模型对话，可以看到，此时显存占用仅为16373M，代价是会降低模型推理速度。

3.2 使用W4A16量化

仅需执行一条命令，就可以完成模型量化工作。

lmdeploy lite auto_awq \
   /root/model/Meta-Llama-3-8B-Instruct \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 1024 \
  --w-bits 4 \
  --w-group-size 128 \
  --work-dir /root/model/Meta-Llama-3-8B-Instruct_4bit

运行时间较长，请耐心等待。量化工作结束后，新的HF模型被保存到Meta-Llama-3-8B-Instruct_4bit目录。下面使用Chat功能运行W4A16量化后的模型。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq

显存占用变为34303MB，明显降低。

为了更加明显体会到W4A16的作用，我们将KV Cache比例再次调为0.01，查看显存占用情况。

lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq --cache-max-entry-count 0.01

可以看到，显存占用变为6751MB，明显降低。

3.3 在线量化 KV

自 v0.4.0 起，LMDeploy KV 量化方式有原来的离线改为在线。并且，支持两种数值精度 int4、int8。量化方式为 per-head per-token 的非对称量化。它具备以下优势：

量化不需要校准数据集 kv int8 量化精度几乎无损，kv int4 量化精度在可接受范围之内推理高效，在 llama2-7b 上加入 int8/int4 kv 量化，RPS 相较于 fp16 分别提升近 30% 和 40% 支持 volta 架构（sm70）及以上的所有显卡型号：V100、20系列、T4、30系列、40系列、A10、A100 等等
通过 LMDeploy 应用 kv 量化非常简单，只需要设定 quant_policy 参数。LMDeploy 规定 qant_policy=4表示 kv int4 量化，quant_policy=8 表示 kv int8 量化。

4. LMDeploy服务（serve）

在前面的章节，我们都是在本地直接推理大模型，这种方式成为本地部署。在生产环境下，我们有时会将大模型封装为 API 接口服务，供客户端访问。

4.1 启动API服务器

通过以下命令启动API服务器，推理Meta-Llama-3-8B-Instruct模型：

lmdeploy serve api_server \
    /root/model/Meta-Llama-3-8B-Instruct \
    --model-format hf \
    --quant-policy 0 \
    --server-name 0.0.0.0 \
    --server-port 23333 \
    --tp 1

其中，model-format、quant-policy这些参数是与第三章中量化推理模型一致的；server-name和server-port表示API服务器的服务IP与服务端口；tp参数表示并行数量（GPU数量）。
通过运行以上指令，我们成功启动了API服务器，请勿关闭该窗口，后面我们要新建客户端连接该服务。
你也可以直接打开http://{host}:23333查看接口的具体使用说明，如下图所示。

意，这一步由于Server在远程服务器上，所以本地需要做一下ssh转发才能直接访问。在你本地打开一个cmd窗口，输入命令如下：

ssh -CNg -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 你的ssh端口号

ssh 端口号就是下面图片里的 39864，请替换为你自己的。

然后打开浏览器，访问http://127.0.0.1:23333。

4.2 命令行客户端连接API服务器

在“4.1”中，我们在终端里新开了一个API服务器。
本节中，我们要新建一个命令行客户端去连接API服务器。首先通过VS Code新建一个终端：
激活conda环境

conda activate lmdeploy

运行命令行客户端：

lmdeploy serve api_client http://localhost:23333

运行后，可以通过命令行窗口直接与模型对话

4.3 网页客户端连接API服务器

关闭刚刚的VSCode终端，但服务器端的终端不要关闭。
运行之前确保自己的gradio版本低于4.0.0。

pip install gradio==3.50.2

新建一个VSCode终端，激活conda环境。

conda activate lmdeploy

使用Gradio作为前端，启动网页客户端。

lmdeploy serve gradio http://localhost:23333 \
    --server-name 0.0.0.0 \
    --server-port 6006

打开浏览器，访问地址http://127.0.0.1:6006
然后就可以与模型进行对话了！

拓展部分：

5. 推理速度

使用 LMDeploy 在 A100（80G）推理 Llama3，每秒请求处理数（RPS）高达 25，是 vLLM 推理效率的 1.8+ 倍。

克隆仓库

cd ~
git clone https://github.com/InternLM/lmdeploy.git

下载测试数据

cd /root/lmdeploy
wget https://hf-mirror.com/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

执行 benchmark 命令(如果你的显存较小，可以调低--cache-max-entry-count)

python benchmark/profile_throughput.py \
    ShareGPT_V3_unfiltered_cleaned_split.json \
    /root/model/Meta-Llama-3-8B-Instruct \
    --cache-max-entry-count 0.8 \
    --concurrency 256 \
    --model-format hf \
    --quant-policy 0 \
    --num-prompts 10000

结果是：

concurrency: 256
elapsed_time: 399.739s

first token latency(s)(min, max, ave): 0.068, 4.066, 0.285
per-token latency(s) percentile(50, 75, 95, 99): [0, 0.094, 0.169, 0.227]

number of prompt tokens: 2238364
number of completion tokens: 2005448
token throughput (completion token): 5016.892 token/s
token throughput (prompt + completion token): 10616.453 token/s
RPS (request per second): 25.016 req/s
RPM (request per minute): 1500.979 req/min

6. 使用LMDeploy运行视觉多模态大模型Llava-Llama-3

6.1 安装依赖

pip install git+https://github.com/haotian-liu/LLaVA.git

6.2 运行模型

运行touch /root/pipeline_llava.py 新建一个文件夹，复制下列代码进去

from lmdeploy import pipeline, ChatTemplateConfig
from lmdeploy.vl import load_image
pipe = pipeline('xtuner/llava-llama-3-8b-v1_1-hf',
                chat_template_config=ChatTemplateConfig(model_name='llama3'))

image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response.text)

运行结果为：

你可以在 https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf 找到更多细节

教程如有误请访问获取最新内容：
https://github.com/SmartFlowAI/Llama3-Tutorial/

LMDeploy仓库：https://github.com/InternLM/lmdeploy.git

Llama 3 Agent 能力体验+微调（Lagent 版）

1. Llama3 ReAct Demo

首先我们先来使用基于 Lagent 的 Web Demo 来直观体验一下 Llama3 模型在 ReAct 范式下的智能体能力。我们让它使用 ArxivSearch 工具来搜索 InternLM2 的技术报告。
从图中可以看到，Llama3-8B-Instruct 模型并没有成功调用工具。原因在于它输出了 query=InternLM2 Technical Report 而非 {'query': 'InternLM2 Technical Report'}，这也就导致了 ReAct 在解析工具输入参数时发生错误，进而导致调用工具失败。

Lagent Web Demo 部分详见 Lagent Web Demo。

2. 微调过程

接下来我们带大家使用 XTuner 在 Agent-FLAN 数据集上微调 Llama3-8B-Instruct，以让 Llama3-8B-Instruct 模型获得智能体能力。
Agent-FLAN 数据集是上海人工智能实验室 InternLM 团队所推出的一个智能体微调数据集，其通过将原始的智能体微调数据以多轮对话的方式进行分解，对数据进行能力分解并平衡，以及加入负样本等方式构建了高效的智能体微调数据集，从而可以大幅提升模型的智能体能力。

2.1 环境配置

我们先来配置相关环境。使用如下指令便可以安装好一个 python=3.10 pytorch=2.1.2+cu121 的基础环境了。

conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

接下来我们安装 XTuner。

cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .[all]

如果在前面的课程中已经配置好了环境，在这里也可以选择直接执行 conda activate llama3 以进入环境。

最后，我们 clone 本教程仓库。

cd ~
git clone https://github.com/SmartFlowAI/Llama3-Tutorial

2.2 模型准备

在微调开始前，我们首先来准备 Llama3-8B-Instruct 模型权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct .

非 InternStudio

我们选择从 OpenXLab 上下载 Meta-Llama-3-8B-Instruct 的权重。

mkdir -p ~/model
cd ~/model
git lfs install
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

2.3 数据集准备

由于 HuggingFace 上的 Agent-FLAN 数据集暂时无法被 XTuner 直接加载，因此我们首先要下载到本地，然后转换成 XTuner 直接可用的格式。

InternStudio

如果是在 InternStudio 上，我们已经准备好了一份转换好的数据，可以直接通过如下脚本准备好：

cd ~
cp -r /root/share/new_models/internlm/Agent-FLAN .
chmod -R 755 Agent-FLAN

非 InternStudio

首先先来下载数据：

cd ~
git lfs install
git clone https://huggingface.co/datasets/internlm/Agent-FLAN

我们已经在 SmartFlowAI/Llama3-Tutorial 仓库中已经准备好了相关转换脚本。

python ~/Llama3-Tutorial/tools/convert_agentflan.py ~/Agent-FLAN/data

在显示下面的内容后，就表示已经转换好了。转换好的数据位于 ~/Agent-FLAN/data_converted

Saving the dataset (1/1 shards): 100%|████████████| 34442/34442

上述过程如下

2.4 微调启动

我们已经为大家准备好了可以一键启动的配置文件，主要是修改好了模型路径、对话模板以及数据路径。

我们使用如下指令以启动训练：

export MKL_SERVICE_FORCE_INTEL=1
xtuner train ~/Llama3-Tutorial/configs/llama3-agentflan/llama3_8b_instruct_qlora_agentflan_3e.py --work-dir ~/llama3_agent_pth --deepspeed deepspeed_zero2

这里本人并没有做完整的训练

在训练完成后，我们将权重转换为 HuggingFace 格式，并合并到原权重中。

# 转换权重
xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-agentflan/llama3_8b_instruct_qlora_agentflan_3e.py \
    ~/llama3_agent_pth/iter_18516.pth \
    ~/llama3_agent_pth/iter_18516_hf

由于训练时间太长，我们也为大家准备好了已经训练好且转换为 HuggingFace 格式的权重，可以直接使用。路径位于 /share/new_models/agent-flan/iter_2316_hf。

如果要使用自己训练的权重，可以使用如下指令合并权重：

# 合并权重
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/model/Meta-Llama-3-8B-Instruct \
    ~/llama3_agent_pth/iter_18516_hf \
    ~/llama3_agent_pth/merged

如果要使用已经训练好的权重，可以使用如下指令合并权重：

export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/model/Meta-Llama-3-8B-Instruct \
    /share/new_models/agent-flan/iter_2316_hf \
    ~/llama3_agent_pth/merged

(llama3) root@intern-studio-114698:~# ls -alh /root/llama3_agent_pth/merged/
total 15G
drwxr-xr-x 2 root root  4.0K May  8 21:22 .
drwxr-xr-x 4 root root  4.0K May  8 21:22 ..
-rw-r--r-- 1 root root   707 May  8 21:22 config.json
-rw-r--r-- 1 root root   121 May  8 21:22 generation_config.json
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00001-of-00009.bin
-rw-r--r-- 1 root root  1.8G May  8 21:22 pytorch_model-00002-of-00009.bin
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00003-of-00009.bin
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00004-of-00009.bin
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00005-of-00009.bin
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00006-of-00009.bin
-rw-r--r-- 1 root root  1.9G May  8 21:22 pytorch_model-00007-of-00009.bin
-rw-r--r-- 1 root root  1.3G May  8 21:22 pytorch_model-00008-of-00009.bin
-rw-r--r-- 1 root root 1003M May  8 21:22 pytorch_model-00009-of-00009.bin
-rw-r--r-- 1 root root   24K May  8 21:22 pytorch_model.bin.index.json
-rw-r--r-- 1 root root   301 May  8 21:22 special_tokens_map.json
-rw-r--r-- 1 root root  8.7M May  8 21:22 tokenizer.json
-rw-r--r-- 1 root root   50K May  8 21:22 tokenizer_config.json

3. Llama3 ReAct Demo

首先我们先来使用基于 Lagent 的 Web Demo 来直观体验一下 Llama3 模型在 ReAct范式下的智能体能力。我们让它使用 ArxivSearch 工具来搜索 InternLM2 的技术报告。

从图中可以看到，Llama3-8B-Instruct 模型并没有成功调用工具。原因在于它输出了 query=InternLM2 Technical Report 而非 {'query': 'InternLM2 Technical Report'}，这也就导致了 ReAct 在解析工具输入参数时发生错误，进而导致调用工具失败。

4. Lagent Web Demo

因为我们在微调前后都需要启动 Web Demo 以观察效果，因此我们将 Web Demo 部分单独拆分出来。

首先我们先来安装 lagent。

pip install lagent

然后我们使用如下指令启动 Web Demo：

streamlit run ~/Llama3-Tutorial/tools/agent_web_demo.py 微调前/后 LLaMA3 模型路径

微调前 LLaMA3 路径：/root/model/Meta-Llama-3-8B-Instruct 微调后 LLaMA3 路径：/root/llama3_agent_pth/merged

微调前

微调后

手把手带你评测 Llama 3 能力（OpenCompass 版）

Llama 3 近期重磅发布，发布了 8B 和 70B 参数量的模型，opencompass团队对 Llama 3 进行了评测！

书生·浦语和机智流社区同学投稿了 OpenCompass 评测 Llama 3，欢迎 Star。

https://github.com/open-compass/OpenCompass/

https://github.com/SmartFlowAI/Llama3-Tutorial/

实践教程

oepncompass 评测

本小节将带大家手把手用 opencompass 评测 Llama3 。

?环境配置

conda create -n llama3 python=3.10
conda activate llama3

conda install git
apt install git-lfs

✨ 下载 Llama3 模型

首先通过 OpenXLab 下载 Llama-3-8B-Instruct 这个模型。

mkdir -p ~/model
cd ~/model
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

或者软链接 InternStudio 中的模型

ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct \
    ~/model

?️ 安装 OpenCompass

cd ~
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .

遇到错误请运行：

pip install -r requirements.txt
pip install protobuf
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU

? 数据准备

下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

(llama3) root@intern-studio-114698:~# mv data/ opencompass/
(llama3) root@intern-studio-114698:~# cd opencompass/
(llama3) root@intern-studio-114698:~/opencompass# ls
LICENSE  README.md  README_zh-CN.md  configs  data  docs  opencompass  opencompass.egg-info  requirements  requirements.txt  run.py  setup.py  tests  tools
(llama3) root@intern-studio-114698:~/opencompass# ls
LICENSE  README.md  README_zh-CN.md  configs  data  docs  opencompass  opencompass.egg-info  requirements  requirements.txt  run.py  setup.py  tests  tools
(llama3) root@intern-studio-114698:~/opencompass#

?️ 命令行快速评测

查看配置文件和支持的数据集名称

OpenCompass 预定义了许多模型和数据集的配置，你可以通过工具列出所有可用的模型和数据集配置。

# 列出所有配置
# python tools/list_configs.py
# 列出所有跟 llama (模型)及 ceval（数据集） 相关的配置
python tools/list_configs.py llama ceval

(llama3) root@intern-studio-114698:~/opencompass# python tools/list_configs.py
+----------------------------+-------------------------------------------------------+
| Model                      | Config Path                                           |
|----------------------------+-------------------------------------------------------|
| accessory_llama2_7b        | configs/models/accessory/accessory_llama2_7b.py       |
| hf_codellama_13b           | configs/models/codellama/hf_codellama_13b.py          |
| hf_codellama_13b_instruct  | configs/models/codellama/hf_codellama_13b_instruct.py |
| hf_codellama_13b_python    | configs/models/codellama/hf_codellama_13b_python.py   |
| hf_codellama_34b           | configs/models/codellama/hf_codellama_34b.py          |
| hf_codellama_34b_instruct  | configs/models/codellama/hf_codellama_34b_instruct.py |
| hf_codellama_34b_python    | configs/models/codellama/hf_codellama_34b_python.py   |
| hf_codellama_7b            | configs/models/codellama/hf_codellama_7b.py           |
| hf_codellama_7b_instruct   | configs/models/codellama/hf_codellama_7b_instruct.py  |
| hf_codellama_7b_python     | configs/models/codellama/hf_codellama_7b_python.py    |
| hf_gsm8k_rft_llama7b2_u13b | configs/models/others/hf_gsm8k_rft_llama7b2_u13b.py   |
| hf_llama2_13b              | configs/models/hf_llama/hf_llama2_13b.py              |
| hf_llama2_13b_chat         | configs/models/hf_llama/hf_llama2_13b_chat.py         |
| hf_llama2_70b              | configs/models/hf_llama/hf_llama2_70b.py              |
| hf_llama2_70b_chat         | configs/models/hf_llama/hf_llama2_70b_chat.py         |
| hf_llama2_7b               | configs/models/hf_llama/hf_llama2_7b.py               |
| hf_llama2_7b_chat          | configs/models/hf_llama/hf_llama2_7b_chat.py          |
| hf_llama3_70b              | configs/models/hf_llama/hf_llama3_70b.py              |
| hf_llama3_70b_instruct     | configs/models/hf_llama/hf_llama3_70b_instruct.py     |
| hf_llama3_8b               | configs/models/hf_llama/hf_llama3_8b.py               |
| hf_llama3_8b_instruct      | configs/models/hf_llama/hf_llama3_8b_instruct.py      |
| hf_llama_13b               | configs/models/hf_llama/hf_llama_13b.py               |
| hf_llama_30b               | configs/models/hf_llama/hf_llama_30b.py               |
| hf_llama_65b               | configs/models/hf_llama/hf_llama_65b.py               |
| hf_llama_7b                | configs/models/hf_llama/hf_llama_7b.py                |
| llama2_13b                 | configs/models/llama/llama2_13b.py                    |
| llama2_13b_chat            | configs/models/llama/llama2_13b_chat.py               |
| llama2_70b                 | configs/models/llama/llama2_70b.py                    |
| llama2_70b_chat            | configs/models/llama/llama2_70b_chat.py               |
| llama2_7b                  | configs/models/llama/llama2_7b.py                     |
| llama2_7b_chat             | configs/models/llama/llama2_7b_chat.py                |
| llama_13b                  | configs/models/llama/llama_13b.py                     |
| llama_30b                  | configs/models/llama/llama_30b.py                     |
| llama_65b                  | configs/models/llama/llama_65b.py                     |
| llama_7b                   | configs/models/llama/llama_7b.py                      |
+----------------------------+-------------------------------------------------------+
+--------------------------------+------------------------------------------------------------------+
| Dataset                        | Config Path                                                      |
|--------------------------------+------------------------------------------------------------------|
| base_medium_llama              | configs/datasets/collections/base_medium_llama.py                |
| ceval_clean_ppl                | configs/datasets/ceval/ceval_clean_ppl.py                        |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen                      | configs/datasets/ceval/ceval_gen.py                              |
| ceval_gen_2daf24               | configs/datasets/ceval/ceval_gen_2daf24.py                       |
| ceval_gen_5f30c7               | configs/datasets/ceval/ceval_gen_5f30c7.py                       |
| ceval_internal_ppl_1cd8bf      | configs/datasets/ceval/ceval_internal_ppl_1cd8bf.py              |
| ceval_ppl                      | configs/datasets/ceval/ceval_ppl.py                              |
| ceval_ppl_1cd8bf               | configs/datasets/ceval/ceval_ppl_1cd8bf.py                       |
| ceval_ppl_578f8d               | configs/datasets/ceval/ceval_ppl_578f8d.py                       |
| ceval_ppl_93e5ce               | configs/datasets/ceval/ceval_ppl_93e5ce.py                       |
| ceval_zero_shot_gen_bd40ef     | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py             |
+--------------------------------+------------------------------------------------------------------+
(llama3) root@intern-studio-114698:~/opencompass#

以 C-Eval_gen 为例：

python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug

遇到 ModuleNotFoundError: No module named ‘rouge’ 错误请运行：

git clone https://github.com/pltrdy/rouge
cd rouge
python setup.py install

命令解析

python run.py \
--datasets ceval_gen \
--hf-path /root/model/Meta-Llama-3-8B-Instruct \  # HuggingFace 模型路径
--tokenizer-path /root/model/Meta-Llama-3-8B-Instruct \  # HuggingFace tokenizer 路径（如果与模型路径相同，可以省略）
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 2048 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 4  \  # 批量大小
--num-gpus 1 \ # 运行模型所需的 GPU 数量
--debug

评测完成后，将会看到：

dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_meta-llama_Meta-Llama-3-8B-Instruct
----------------------------------------------  ---------  -------------  ------  --------------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                63.16
ceval-operating_system                          1c2571     accuracy       gen                                                                                63.16
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                52.38
ceval-college_programming                       4ca32a     accuracy       gen                                                                                62.16
ceval-college_physics                           963fa8     accuracy       gen                                                                                42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                29.17
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                42.11
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                27.78
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                32.43
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                62.5
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                 5.56
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                26.32
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                63.16
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                36.84
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                31.58
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                71.43
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                57.89
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                52.17
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                45.45
ceval-business_administration                   c1614e     accuracy       gen                                                                                30.3
ceval-marxism                                   cf874c     accuracy       gen                                                                                47.37
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                50
ceval-education_science                         591fee     accuracy       gen                                                                                51.72
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                72.73
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                68.42
ceval-high_school_geography                     865461     accuracy       gen                                                                                42.11
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                57.14
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                50
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                78.95
ceval-logic                                     f5b022     accuracy       gen                                                                                40.91
ceval-law                                       a110a1     accuracy       gen                                                                                33.33
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                34.78
ceval-art_studies                               2a1300     accuracy       gen                                                                                54.55
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                55.17
ceval-legal_professional                        ce8787     accuracy       gen                                                                                30.43
ceval-high_school_chinese                       315705     accuracy       gen                                                                                31.58
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                59.09
ceval-civil_servant                             87d061     accuracy       gen                                                                                34.04
ceval-sports_science                            70f27b     accuracy       gen                                                                                63.16
ceval-plant_protection                          8941f9     accuracy       gen                                                                                68.18
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                57.89
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                54.55
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                52.17
ceval-accountant                                002837     accuracy       gen                                                                                44.9
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                45.16
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                34.69
ceval-physician                                 6e277d     accuracy       gen                                                                                57.14
ceval-stem                                      -          naive_average  gen                                                                                46.34
ceval-social-science                            -          naive_average  gen                                                                                51.52
ceval-humanities                                -          naive_average  gen                                                                                48.72
ceval-other                                     -          naive_average  gen                                                                                50.05
ceval-hard                                      -          naive_average  gen                                                                                32.65
ceval                                           -          naive_average  gen                                                                                48.63
05/08 22:20:15 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240508_221020/summary/summary_20240508_221020.txt
05/08 22:20:15 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240508_221020/summary/summary_20240508_221020.csv

?️ 快速评测

config 快速评测

在 config 下添加模型配置文件 eval_llama3_8b_demo.py

from mmengine.config import read_base

with read_base():
    from .datasets.mmlu.mmlu_gen_4d595a import mmlu_datasets

datasets = [*mmlu_datasets]

from opencompass.models import HuggingFaceCausalLM

models = [
dict(
type=HuggingFaceCausalLM,
abbr='Llama3_8b', # 运行完结果展示的名称
path='/root/model/Meta-Llama-3-8B-Instruct', # 模型路径
tokenizer_path='/root/model/Meta-Llama-3-8B-Instruct', # 分词器路径
model_kwargs=dict(
device_map='auto',
trust_remote_code=True
),
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
trust_remote_code=True,
use_fast=False
),
generation_kwargs={"eos_token_id": [128001, 128009]},
batch_padding=True,
max_out_len=100,
max_seq_len=2048,
batch_size=16,
run_cfg=dict(num_gpus=1),
)
]

运行python run.py configs/eval_llama3_8b_demo.py

遇到如下报错，测试结果无数据。待排查原因：

作为新模型支持快速评测

opencompass 官方已经支持 Llama3

https://github.com/open-compass/opencompass/commit/a256753221ad2a33ec9750b31f6284b581c1e1fd#diff-e446451cf0c8fc747c5c720f65f8fa62d7bd7f5c88668692248517d249c798b5

Llama 3 实践教程（InternStudio 版）

文章目录

Llama3本地 Web Demo部署

初始环境配置

下载模型

Web Demo 部署

XTuner 微调 Llama3 个人小助手

自我认知训练数据集准备

XTuner配置文件准备

训练模型

推理验证

XTuner微调Llama3图片理解多模态

准备 Visual Encoder 权重

XTuner 微调 Llama3 图片理解多模态

环境配置

准备 Image Projector 权重

数据准备

微调过程

训练启动

效果体验

Pretrain 模型

Finetune 后 模型

1.2 Llama3 的下载

2. LMDeploy Chat CLI 工具

3. LMDeploy模型量化(lite)

3.1 设置最大KV Cache缓存大小

3.2 使用W4A16量化

3.3 在线量化 KV

4. LMDeploy服务（serve）

4.1 启动API服务器

4.2 命令行客户端连接API服务器

4.3 网页客户端连接API服务器

拓展部分：

5. 推理速度

6. 使用LMDeploy运行视觉多模态大模型Llava-Llama-3

6.1 安装依赖

6.2 运行模型

Llama 3 Agent 能力体验+微调（Lagent 版）

1. Llama3 ReAct Demo

2. 微调过程

2.1 环境配置

2.2 模型准备

2.3 数据集准备

2.4 微调启动

3. Llama3 ReAct Demo

4. Lagent Web Demo

手把手带你评测 Llama 3 能力（OpenCompass 版）

实践教程

oepncompass 评测

?环境配置

✨ 下载 Llama3 模型

?️ 安装 OpenCompass

? 数据准备

?️ 命令行快速评测

查看配置文件和支持的数据集名称

以 C-Eval_gen 为例：

?️ 快速评测

config 快速评测

作为新模型支持快速评测

Finetune 后模型