Llama中文大模型-模型部署

选择学习路径

快速上手-使用Anaconda

第 0 步：前提条件

确保安装了 Python 3.10 以上版本。

第 1 步：准备环境

如需设置环境，安装所需要的软件包，运行下面的命令。

git clone https://github.com/LlamaFamily/Llama-Chinese.git
cd Llama-Chinese
pip install -r requirements.txt

第 2 步：下载模型

你可以从以下来源下载Atom-7B-Chat模型。

HuggingFace

ModelScope

WideModel

第 3 步：进行推理

使用Atom-7B-Chat模型进行推理

创建一个名为 quick_start.py 的文件，并将以下内容复制到该文件中。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device_map = "cuda:0" if torch.cuda.is_available() else "auto"
model = AutoModelForCausalLM.from_pretrained('FlagAlpha/Atom-7B-Chat',device_map=device_map,torch_dtype=torch.float16,load_in_8bit=True,trust_remote_code=True,use_flash_attention_2=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Atom-7B-Chat',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介绍一下中国\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids
if torch.cuda.is_available():
  input_ids = input_ids.to('cuda')
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

2. 运行 quick_start.py 代码。

python quick_start.py

快速上手-使用Docker

详情参见：Docker部署

第一步：准备docker镜像，通过docker容器启动

git clone https://github.com/LlamaFamily/Llama-Chinese.git

cd Llama-Chinese

docker build -f docker/Dockerfile -t flagalpha/llama2-chinese:gradio .

第二步：通过docker-compose启动chat_gradio

cd Llama-Chinese/docker
doker-compose up -d --build

快速上手-使用llama.cpp

详情参见：使用llama.cpp

快速上手-使用gradio

基于gradio搭建的问答界面，实现了流式的输出，将下面代码复制到控制台运行，以下代码以Atom-7B-Chat模型为例，不同模型只需修改一下面的model_name_or_path对应的模型名称就好了?

git clone https://github.com/LlamaFamily/Llama-Chinese.git

cd Llama-Chinese

python examples/chat_gradio.py --model_name_or_path FlagAlpha/Atom-7B-Chat

FastAPI接口搭建

为了方便通过API方式调用模型，我们提供了脚本用来快速搭建FastAPI接口，相关测试代码与API参数设置见API 调用。

https://github.com/LlamaFamily/Llama-Chinese/blob/main/scripts/api/README.md