课程资料
五一 Llama 3 超级课堂 | 第二节 Llama 3 微调个人小助手认知(XTuner版)_哔哩哔哩_bilibili SmartFlowAI/Llama3-Tutorial: Llama3-Tutorial(XTuner、LMDeploy、OpenCompass) (github.com) 开发机 (intern-ai.org.cn) Llama3-Tutorial/docs/hello_world.md at main · SmartFlowAI/Llama3-Tutorial (github.com)预先准备
注册InternStudio、创建开发机并启动,这里要选择 “资源配置”,指的是GPU的资源。 安装VSCODE 建立远程链接 由于代码经常更新,所以需要及时同步最新的代码,如果那段程序调试出现问题,可以先尝试更新远程代码库。git branch -r #查看分支
git checkout main
git pull origin main
第一节 本地 Web Demo 部署
#安装环境
conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
#通过软连接链接大模型文件。
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct ~/model/Meta-Llama-3-8B-Instruct
#运行,下面有两种方式, 建议采用quant模式,占用资源较少。在最小的“资源配置”的开发机上也可以跑。
streamlit run ~/Llama3-Tutorial/tools/internstudio_web_demo.py ~/model/Meta-Llama-3-8B-Instruct
streamlit run ~/Llama3-Tutorial/tools/internstudio_quant_web_demo.py ~/model/Meta-Llama-3-8B-Instruct (quant模式)
第二节 Llama 3 微调个人小助手认知(XTuner版)
#通过gdata.py,创建微调的数据集。这里可以随便起个名字,例如:“南方蓝天”
cd ~/Llama3-Tutorial
python tools/gdata.py
conda activate llama3 (激活环境)
cd ~/Llama3-Tutorial
# 开始训练,使用 deepspeed 加速,A100 40G显存 耗时24分钟
xtuner train configs/assistant/llama3_8b_instruct_qlora_assistant.py --work-dir /root/llama3_pth
运行完成后的目录
# Adapter PTH 转 HF 格式
xtuner convert pth_to_hf /root/llama3_pth/llama3_8b_instruct_qlora_assistant.py \
/root/llama3_pth/iter_500.pth \
/root/llama3_hf_adapter
# 模型合并
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/model/Meta-Llama-3-8B-Instruct \
/root/llama3_hf_adapter\
/root/llama3_hf_merged Adapter
第三节 Llama 3 图片理解能力微调(XTuner+LLaVA)
启动环境
conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
#单独安装lfs
pip install lfs
#需要的 openai/clip-vit-large-patch14-336,权重,即 Visual Encoder 权重。
cd ~/model
ln -s /root/share/new_models/openai/clip-vit-large-patch14-336 .
#然后我们准备 Llava 将要用到的 Image Projector 部分权重。
ln -s /root/share/new_models/xtuner/llama3-llava-iter_2181.pth .
#数据准备
cd ~
git clone https://github.com/InternLM/tutorial -b camp2
python ~/tutorial/xtuner/llava/llava_data/repeat.py \
-i ~/tutorial/xtuner/llava/llava_data/unique_data.json \
-o ~/tutorial/xtuner/llava/llava_data/repeated_data.json \
-n 200
微调过程
启动训练:(!!!这一步需要50M的显存,由于开发机的显存不够,所以无法完成。有些遗憾)
xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
--work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2
转换为 HuggingFace 格式
xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
~/model/llama3-llava-iter_2181.pth \
~/llama3_llava_pth/pretrain_iter_2181_hf
第四节 Llama 3 高效部署实践(LMDeploy版)
环境配置
# 如果你是InternStudio 可以直接使用
# studio-conda -t lmdeploy -o pytorch-2.1.2
# 初始化环境
conda create -n lmdeploy python=3.10
conda activate lmdeploy
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
安装lmdeploy最新版。
cd ~
pip install -U lmdeploy[all]
LMDeploy Chat CLI 工具
conda activate lmdeploy #切换环境
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct
3. LMDeploy模型量化(lite)
本部分内容主要介绍如何对模型进行量化。主要包括 KV8量化和W4A16量化。
3.1 设置最大KV Cache缓存大小
模型在运行时,占用的显存可大致分为三部分:模型参数本身占用的显存、KV Cache占用的显存,以及中间运算结果占用的显存。LMDeploy的KV Cache管理器可以通过设置--cache-max-entry-count参数,控制KV缓存占用剩余显存的最大比例。默认的比例为0.8。
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/
内存基本都使用了。
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.5
用量降低了一些
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct/ --cache-max-entry-count 0.01
3.2 使用W4A16量化
lmdeploy lite auto_awq \
/root/model/Meta-Llama-3-8B-Instruct \
--calib-dataset 'ptb' \
--calib-samples 128 \
--calib-seqlen 1024 \
--w-bits 4 \
--w-group-size 128 \
--work-dir /root/model/Meta-Llama-3-8B-Instruct_4bit
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq
lmdeploy chat /root/model/Meta-Llama-3-8B-Instruct_4bit --model-format awq --cache-max-entry-count 0.01
已经降到了6G的内存。
3.3 在线量化 KV
自 v0.4.0 起,LMDeploy KV 量化方式有原来的离线改为在线。并且,支持两种数值精度 int4、int8。量化方式为 per-head per-token 的非对称量化。它具备以下优势:
量化不需要校准数据集
kv int8 量化精度几乎无损,kv int4 量化精度在可接受范围之内
推理高效,在 llama2-7b 上加入 int8/int4 kv 量化,RPS 相较于 fp16 分别提升近 30% 和 40%
支持 volta 架构(sm70)及以上的所有显卡型号:V100、20系列、T4、30系列、40系列、A10、A100 等等 通过 LMDeploy 应用 kv 量化非常简单,只需要设定 quant_policy 参数。LMDeploy 规定 qant_policy=4表示 kv int4 量化,quant_policy=8 表示 kv int8 量化。
4. LMDeploy服务(serve)
4.1 启动API服务器
lmdeploy serve api_server \
/root/model/Meta-Llama-3-8B-Instruct \
--model-format hf \
--quant-policy 0 \
--server-name 0.0.0.0 \
--server-port 23333 \
--tp 1
设定SSH的转发
ssh -CNg -L 23333:127.0.0.1:23333 root@ssh.intern-ai.org.cn -p 46647
4.2 命令行客户端连接API服务器
在“4.1”中,我们在终端里新开了一个API服务器。 本节中,我们要新建一个命令行客户端去连接API服务器。首先通过VS Code新建一个终端: 激活conda环境
1, 命令行方式
conda activate lmdeploy
lmdeploy serve api_client http://localhost:23333
2 网页方式
conda activate lmdeploy
pip install gradio==3.50.2
lmdeploy serve gradio http://localhost:23333 \
--server-name 0.0.0.0 \
--server-port 6006
5. 推理速度
克隆仓库
cd ~
git clone https://github.com/InternLM/lmdeploy.git
下载测试数据(642M)
cd /root/lmdeploy
wget https://hf-mirror.com/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
执行 benchmark 命令(如果你的显存较小,可以调低--cache-max-entry-count)
python benchmark/profile_throughput.py \
ShareGPT_V3_unfiltered_cleaned_split.json \
/root/model/Meta-Llama-3-8B-Instruct \
--cache-max-entry-count 0.8 \
--concurrency 256 \
--model-format hf \
--quant-policy 0 \
--num-prompts 10000
我在这运行出了一个错误提示, 让后将/root/lmdeploy/benchmark/profile_throughput.py 中的286行的“ArgumentHelper.enable_prefix_caching(pt_group)”注释掉就可以了。
测试的时候开发机还没有跑满
第五节 Llama 3 Agent 体验微调(XTuner版)
2.1 环境配置
环境配置还是llama3
conda activate llama3
安装XTuner(完成之前的课程,已经已经装完了)
cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .[all]
2.2 模型准备
2.3 数据集准备
由于 HuggingFace 上的 Agent-FLAN 数据集暂时无法被 XTuner 直接加载,因此我们首先要下载到本地,然后转换成 XTuner 直接可用的格式。
cd ~
cp -r /root/share/new_models/internlm/Agent-FLAN .
chmod -R 755 Agent-FLAN
2.4 微调启动
由于训练时间太长,我们也为大家准备好了已经训练好且转换为 HuggingFace 格式的权重,可以直接使用。路径位于
/share/new_models/agent-flan/iter_2316_hf。
如果要使用已经训练好的权重,可以使用如下指令合并权重:
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/model/Meta-Llama-3-8B-Instruct \
/share/new_models/agent-flan/iter_2316_hf \
~/llama3_agent_pth/merged
4. Lagent Web Demo(在教材的后面,但是要先使用的内容,建议教材可以提到前面)
pip install lagent
streamlit run ~/Llama3-Tutorial/tools/agent_web_demo.py /root/model/Meta-Llama-3-8B-Instruct
streamlit run ~/Llama3-Tutorial/tools/agent_web_demo.py /root/llama3_agent_pth/merged
启动后对比问题,这次使用的问题是“查找关于InternLM2的论文”
对比之前的结果,这次已经可以自动调用ArxivSearch接口了。
第六节 Llama 3 能力评测(OpenCompass 版)
这次继续使用llama3环境
conda activate llama3
安装 OpenCompass。简介一下,OpenCompass是上海人工智能实验室开源的大模型评测平台,它涵盖了学科、语言、知识、理解、推理等五大评测维度,可以全面评估大模型的能力。上海人工智能实验室 (shlab.org.cn)
cd ~
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
数据准备
下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip
下载速度实在是太慢了,后面没法操作了,所以实战部分就到此结束, 后续的只是课程的内容。
命令行快速评测
OpenCompass 预定义了许多模型和数据集的配置,你可以通过 工具 列出所有可用的模型和数据集配置。
# 列出所有配置
# python tools/list_configs.py
# 列出所有跟 llama (模型)及 ceval(数据集) 相关的配置
python tools/list_configs.py llama ceval
以 C-Eval_gen 为例:
python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug
命令解析
python run.py \
--datasets ceval_gen \
--hf-path /root/model/Meta-Llama-3-8B-Instruct \ # HuggingFace 模型路径
--tokenizer-path /root/model/Meta-Llama-3-8B-Instruct \ # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \ # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \ # 构建模型的参数
--max-seq-len 2048 \ # 模型可以接受的最大序列长度
--max-out-len 16 \ # 生成的最大 token 数
--batch-size 4 \ # 批量大小
--num-gpus 1 \ # 运行模型所需的 GPU 数量
--debug
评测完成后,将会看到:
dataset version metric mode opencompass.models.huggingface.HuggingFace_meta-llama_Meta-Llama-3-8B-Instruct
---------------------------------------------- --------- ------------- ------ --------------------------------------------------------------------------------
ceval-computer_network db9ce2 accuracy gen 63.16
ceval-operating_system 1c2571 accuracy gen 63.16
ceval-computer_architecture a74dad accuracy gen 52.38
ceval-college_programming 4ca32a accuracy gen 62.16
ceval-college_physics 963fa8 accuracy gen 42.11
ceval-college_chemistry e78857 accuracy gen 29.17
ceval-advanced_mathematics ce03e2 accuracy gen 42.11
ceval-probability_and_statistics 65e812 accuracy gen 27.78
ceval-discrete_mathematics e894ae accuracy gen 25
ceval-electrical_engineer ae42b9 accuracy gen 32.43
ceval-metrology_engineer ee34ea accuracy gen 62.5
ceval-high_school_mathematics 1dc5bf accuracy gen 5.56
ceval-high_school_physics adf25f accuracy gen 26.32
ceval-high_school_chemistry 2ed27f accuracy gen 63.16
ceval-high_school_biology 8e2b9a accuracy gen 36.84
ceval-middle_school_mathematics bee8d5 accuracy gen 31.58
ceval-middle_school_biology 86817c accuracy gen 71.43
ceval-middle_school_physics 8accf6 accuracy gen 57.89
ceval-middle_school_chemistry 167a15 accuracy gen 80
ceval-veterinary_medicine b4e08d accuracy gen 52.17
ceval-college_economics f3f4e6 accuracy gen 45.45
ceval-business_administration c1614e accuracy gen 30.3
ceval-marxism cf874c accuracy gen 47.37
ceval-mao_zedong_thought 51c7a4 accuracy gen 50
ceval-education_science 591fee accuracy gen 51.72
ceval-teacher_qualification 4e4ced accuracy gen 72.73
ceval-high_school_politics 5c0de2 accuracy gen 68.42
ceval-high_school_geography 865461 accuracy gen 42.11
ceval-middle_school_politics 5be3e7 accuracy gen 57.14
ceval-middle_school_geography 8a63be accuracy gen 50
ceval-modern_chinese_history fc01af accuracy gen 52.17
ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 78.95
ceval-logic f5b022 accuracy gen 40.91
ceval-law a110a1 accuracy gen 33.33
ceval-chinese_language_and_literature 0f8b68 accuracy gen 34.78
ceval-art_studies 2a1300 accuracy gen 54.55
ceval-professional_tour_guide 4e673e accuracy gen 55.17
ceval-legal_professional ce8787 accuracy gen 30.43
ceval-high_school_chinese 315705 accuracy gen 31.58
ceval-high_school_history 7eb30a accuracy gen 65
ceval-middle_school_history 48ab4a accuracy gen 59.09
ceval-civil_servant 87d061 accuracy gen 34.04
ceval-sports_science 70f27b accuracy gen 63.16
ceval-plant_protection 8941f9 accuracy gen 68.18
ceval-basic_medicine c409d6 accuracy gen 57.89
ceval-clinical_medicine 49e82d accuracy gen 54.55
ceval-urban_and_rural_planner 95b885 accuracy gen 52.17
ceval-accountant 002837 accuracy gen 44.9
ceval-fire_engineer bc23f5 accuracy gen 38.71
ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 45.16
ceval-tax_accountant 3a5e3c accuracy gen 34.69
ceval-physician 6e277d accuracy gen 57.14
ceval-stem - naive_average gen 46.34
ceval-social-science - naive_average gen 51.52
ceval-humanities - naive_average gen 48.72
ceval-other - naive_average gen 50.05
ceval-hard - naive_average gen 32.65
ceval - naive_average gen 48.63
config 快速评测,在config 下添加模型配置文件 eval_llama3_8b_demo.py
from mmengine.config import read_base
with read_base():
from .datasets.mmlu.mmlu_gen_4d595a import mmlu_datasets
datasets = [*mmlu_datasets]
from opencompass.models import HuggingFaceCausalLM
models = [
dict(
type=HuggingFaceCausalLM,
abbr='Llama3_8b', # 运行完结果展示的名称
path='/root/model/Meta-Llama-3-8B-Instruct', # 模型路径
tokenizer_path='/root/model/Meta-Llama-3-8B-Instruct', # 分词器路径
model_kwargs=dict(
device_map='auto',
trust_remote_code=True
),
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
trust_remote_code=True,
use_fast=False
),
generation_kwargs={"eos_token_id": [128001, 128009]},
batch_padding=True,
max_out_len=100,
max_seq_len=2048,
batch_size=16,
run_cfg=dict(num_gpus=1),
)
]
运行python run.py configs/eval_llama3_8b_demo.py
评测完成后,将会看到:
dataset version metric mode Llama3_8b
------------------------------------------------- --------- -------- ------ -----------
lukaemon_mmlu_college_biology caec7d accuracy gen 66.67
lukaemon_mmlu_college_chemistry 520aa6 accuracy gen 37
lukaemon_mmlu_college_computer_science 99c216 accuracy gen 53
lukaemon_mmlu_college_mathematics 678751 accuracy gen 36
lukaemon_mmlu_college_physics 4f382c accuracy gen 48.04
lukaemon_mmlu_electrical_engineering 770ce3 accuracy gen 43.45
lukaemon_mmlu_astronomy d3ee01 accuracy gen 68.42
lukaemon_mmlu_anatomy 72183b accuracy gen 54.07
lukaemon_mmlu_abstract_algebra 2db373 accuracy gen 31
lukaemon_mmlu_machine_learning 0283bb accuracy gen 43.75
lukaemon_mmlu_clinical_knowledge cb3218 accuracy gen 58.87
lukaemon_mmlu_global_facts ab07b6 accuracy gen 39
lukaemon_mmlu_management 80876d accuracy gen 78.64
lukaemon_mmlu_nutrition 4543bd accuracy gen 72.55
lukaemon_mmlu_marketing 7394e3 accuracy gen 90.17
lukaemon_mmlu_professional_accounting 444b7f accuracy gen 49.65
lukaemon_mmlu_high_school_geography 0780e6 accuracy gen 75.25
lukaemon_mmlu_international_law cf3179 accuracy gen 62.81
lukaemon_mmlu_moral_scenarios f6dbe2 accuracy gen 38.66
lukaemon_mmlu_computer_security ce7550 accuracy gen 35
lukaemon_mmlu_high_school_microeconomics 04d21a accuracy gen 62.18
lukaemon_mmlu_professional_law 5f7e6c accuracy gen 47.91
lukaemon_mmlu_medical_genetics 881ef5 accuracy gen 62
lukaemon_mmlu_professional_psychology 221a16 accuracy gen 69.44
lukaemon_mmlu_jurisprudence 001f24 accuracy gen 69.44
lukaemon_mmlu_world_religions 232c09 accuracy gen 74.85
lukaemon_mmlu_philosophy 08042b accuracy gen 71.06
lukaemon_mmlu_virology 12e270 accuracy gen 43.98
lukaemon_mmlu_high_school_chemistry ae8820 accuracy gen 42.86
lukaemon_mmlu_public_relations e7d39b accuracy gen 60
lukaemon_mmlu_high_school_macroeconomics a01685 accuracy gen 57.95
lukaemon_mmlu_human_sexuality 42407c accuracy gen 74.05
lukaemon_mmlu_elementary_mathematics 269926 accuracy gen 28.84
lukaemon_mmlu_high_school_physics 93278f accuracy gen 26.49
lukaemon_mmlu_high_school_computer_science 9965a5 accuracy gen 63
lukaemon_mmlu_high_school_european_history eefc90 accuracy gen 74.55
lukaemon_mmlu_business_ethics 1dec08 accuracy gen 51
lukaemon_mmlu_moral_disputes a2173e accuracy gen 70.81
lukaemon_mmlu_high_school_statistics 8f3f3a accuracy gen 52.78
lukaemon_mmlu_miscellaneous 935647 accuracy gen 54.15
lukaemon_mmlu_formal_logic cfcb0c accuracy gen 42.86
lukaemon_mmlu_high_school_government_and_politics 3c52f9 accuracy gen 86.01
lukaemon_mmlu_prehistory bbb197 accuracy gen 64.2
lukaemon_mmlu_security_studies 9b1743 accuracy gen 75.51
lukaemon_mmlu_high_school_biology 37b125 accuracy gen 74.84
lukaemon_mmlu_logical_fallacies 9cebb0 accuracy gen 68.1
lukaemon_mmlu_high_school_world_history 048e7e accuracy gen 83.12
lukaemon_mmlu_professional_medicine 857144 accuracy gen 72.43
lukaemon_mmlu_high_school_mathematics ed4dc0 accuracy gen 31.48
lukaemon_mmlu_college_medicine 38709e accuracy gen 56.65
lukaemon_mmlu_high_school_us_history 8932df accuracy gen 82.84
lukaemon_mmlu_sociology c266a2 accuracy gen 76.12
lukaemon_mmlu_econometrics d1134d accuracy gen 55.26
lukaemon_mmlu_high_school_psychology 7db114 accuracy gen 65.14
lukaemon_mmlu_human_aging 82a410 accuracy gen 62.33
lukaemon_mmlu_us_foreign_policy 528cfe accuracy gen 70
lukaemon_mmlu_conceptual_physics 63588e accuracy gen 26.38
opencompass 官方已经支持 Llama3
https://github.com/open-compass/opencompass/commit/a256753221ad2a33ec9750b31f6284b581c1e1fd#diff-e446451cf0c8fc747c5c720f65f8fa62d7bd7f5c88668692248517d249c798b5