（5-5-02）金融市场情绪分析：使用Llama 2 大模型实现财经信息的情感分析微调（2）

5.5.4 Llama-2语言模型操作

编写下面的代码，功能是加载、配置 Llama-2 语言模型以及其对应的分词器，准备好模型为后续的对话生成任务做好准备。

model_name = "../input/llama-2/pytorch/7b-hf/1"

compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config, 
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          trust_remote_code=True,
                                         )
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model, tokenizer = setup_chat_format(model, tokenizer)

上述代码的实现流程如下：

（1）model_name = "../input/llama-2/pytorch/7b-hf/1"：设置Llama-2 语言模型的路径。

（2）compute_dtype = getattr(torch, "float16")：从 Torch 库中获取 float16 数据类型，作为计算时的数据类型。

（3）创建了一个 BitsAndBytesConfig 对象 bnb_config，用于配置量化相关参数：

load_in_4bit=True：以 4 位格式加载模型权重。 bnb_4bit_quant_type="nf4"：使用 "nf4" 量化类型。 bnb_4bit_compute_dtype=compute_dtype：使用之前获取的 float16 数据类型进行计算。 bnb_4bit_use_double_quant=True：使用双量化。

（4）使用 AutoModelForCausalLM.from_pretrained()方法加载预训练的 Llama-2 语言模型，并配置量化参数和设备信息。

（5）禁用模型的缓存，以确保每次预测都是基于最新的输入。

（6）将预训练令牌概率设置为1，以便模型在生成对话时保持更加开放的状态。

（7）使用 AutoTokenizer.from_pretrained()方法加载与模型对应的分词器，并配置填充相关参数。

（8）最后，调用 setup_chat_format() 函数来进一步配置模型和分词器的格式。

执行后会输出如下加载 Llama-2 语言模型过程中的进度信息，展示加载的进度和每个碎片加载所花费的时间。

Loading checkpoint shards: 100%2/2 [03:20<00:00, 91.76s/it]

5.5.5 情感标签预测

（1）编写函数 predict(test, model, tokenizer)，功能是对测试数据进行情感标签的预测。通过这个函数，可以使用已加载的 Llama-2 模型对测试数据进行情感标签的预测，从而评估模型在情感分析任务上的性能。

def predict(test, model, tokenizer):
    y_pred = []
    for i in tqdm(range(len(X_test))):
        prompt = X_test.iloc[i]["text"]
        pipe = pipeline(task="text-generation", 
                        model=model, 
                        tokenizer=tokenizer, 
                        max_new_tokens = 1, 
                        temperature = 0.0,
                       )
        result = pipe(prompt)
        answer = result[0]['generated_text'].split("=")[-1]
        if "positive" in answer:
            y_pred.append("positive")
        elif "negative" in answer:
            y_pred.append("negative")
        elif "neutral" in answer:
            y_pred.append("neutral")
        else:
            y_pred.append("none")
    return y_pred

在上述代码中，函数predict(test, model, tokenizer)会遍历测试数据集中的每个样本，并使用 Llama-2 模型生成一个新的令牌以预测对应的情感标签。具体步骤如下：

获取测试数据中的每个样本的提示文本。使用 Hugging Face 的管道（pipeline）方法进行文本生成，以生成一个新的令牌。从生成的文本中提取预测的情感标签。将预测的情感标签添加到预测列表中。返回预测的情感标签列表。

（2）调用前面定义的函数predict()，使用加载好的模型和分词器对测试数据进行情感标签的预测，并将预测结果存储在变量 y_pred 中。

y_pred = predict(test, model, tokenizer)

（3）调用前面定义的函数evaluate()，评估模型对测试数据的情感标签预测结果。

evaluate(y_true, y_pred)

函数evaluate()会计算模型的准确率、每个情感标签的准确率、生成分类报告和混淆矩阵，并打印输出这些评估结果。执行后会输出：

Accuracy: 0.373
Accuracy for label 0: 0.027
Accuracy for label 1: 0.937
Accuracy for label 2: 0.157

Classification Report:
              precision    recall  f1-score   support

           0       0.89      0.03      0.05       300
           1       0.34      0.94      0.50       300
           2       0.67      0.16      0.25       300

    accuracy                           0.37       900
   macro avg       0.63      0.37      0.27       900
weighted avg       0.63      0.37      0.27       900


Confusion Matrix:
[[  8 287   5]
 [  1 281  18]
 [  0 253  47]]

5.5.6 大模型微调（Fine-tuning）

（1）准备微调的设置李新喜，初始化了一个 Simple Fine-tuning Trainer (SFTTrainer) 对象用于微调模型。使用 Parameter-Efficient Fine-Tuning (PEFT) 方法训练大型语言模型，PEFT 方法旨在通过调整少量参数来微调模型，与完全微调整个模型相比，它能够节省时间并减少计算和存储开销。此外，PEFT 方法还可以缓解遗忘问题，这在完全微调语言模型时经常会发生。

output_dir="trained_weigths"
# 指定保存训练过程中模型权重和日志的目录。

peft_config = LoraConfig(
    lora_alpha=16, 
    lora_dropout=0.1,
    r=64,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
)
# 配置 PEFT 方法的参数，包括 LoRA 矩阵的学习率、dropout 等参数。

training_arguments = TrainingArguments(
    output_dir=output_dir,                    # 保存训练日志和检查点的目录
    num_train_epochs=3,                       # 训练周期数
    per_device_train_batch_size=1,            # 每个设备上每批样本数
    gradient_accumulation_steps=8,            # 更新模型参数之前累积梯度的步数
    gradient_checkpointing=True,              # 使用梯度检查点以节省内存
    optim="paged_adamw_32bit",
    save_steps=0,
    logging_steps=25,                         # 每 10 步记录一次训练指标
    learning_rate=2e-4,                       # 学习率，基于 QLoRA 论文
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,                        # 最大梯度范数，基于 QLoRA 论文
    max_steps=-1,
    warmup_ratio=0.03,                        # 学习率预热比例，基于 QLoRA 论文
    group_by_length=True,
    lr_scheduler_type="cosine",               # 使用余弦退火学习率调度器
    report_to="tensorboard",                  # 报告指标到 tensorboard
    evaluation_strategy="epoch"               # 每个周期保存检查点
)
# 配置训练模型的参数，包括训练周期数、批次大小、学习率、梯度累积步数等。

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=eval_data,
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    max_seq_length=1024,
    packing=False,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    }
)
# 初始化 SFTTrainer 对象，传入模型、训练参数、训练数据集、评估数据集、PEFT 配置等，并设置数据集参数。

# 初始化 SFTTrainer 对象，传入模型、训练参数、训练数据集、评估数据集、PEFT 配置等，并设置数据集参数。

（2）通过方法train()启动微调过程，即开始训练模型。通过调用此方法，模型将根据指定的训练参数（例如训练周期数、学习率、批次大小等）和训练数据集进行微调。在训练过程中，模型将逐步学习如何更好地适应特定的任务或数据集，以提高其性能。

trainer.train()

执行后会输出：

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
 [336/336 1:01:27, Epoch 2/3]

Epoch	Training Loss	Validation Loss
0	0.802000	0.700182
2	0.516500	0.714122

TrainOutput(global_step=336, training_loss=0.7176062436330886, metrics={'train_runtime': 3700.9349, 'train_samples_per_second': 0.73, 'train_steps_per_second': 0.091, 'total_flos': 1.0717884041527296e+16, 'train_loss': 0.7176062436330886, 'epoch': 2.99})

（3）通过下面的两行代码保存微调后的模型和分词器。其中trainer.save_model()用于保存微调后的模型到指定的输出目录，tokenizer.save_pretrained(output_dir)用于保存微调后的分词器到指定的输出目录。

trainer.save_model()
tokenizer.save_pretrained(output_dir)

执行后会输出：

('trained_weigths/tokenizer_config.json',
 'trained_weigths/special_tokens_map.json',
 'trained_weigths/tokenizer.model',
 'trained_weigths/added_tokens.json',
 'trained_weigths/tokenizer.json')

（4）启用TensorBoard 可视化工具，加载指定目录下的日志文件，以便在 TensorBoard 中查看训练过程中的指标和图表。

%load_ext tensorboard
%tensorboard --logdir logs/runs

TensorBoard 是 TensorFlow 提供的一个强大的可视化工具，可以帮助用户更直观地理解模型的训练情况。执行效果如图5-3所示，这是一个交互式图形的静态图像。要查看交互版本，请复制此内核并在编辑器中打开它。

图5-3 模型训练的可视化图

（5）释放内存和清理资源，避免内存泄漏和占用过多的系统资源。

import gc

del [model, tokenizer, peft_config, trainer, train_data, eval_data, bnb_config, training_arguments]
del [df, X_train, X_eval]
del [TrainingArguments, SFTTrainer, LoraConfig, BitsAndBytesConfig]
for _ in range(100):
    torch.cuda.empty_cache()
    gc.collect()

对上述代码的具体说明如下所示：

删除之前创建的模型、分词器、PEFT 配置、训练器、训练数据集、评估数据集等对象，以释放它们所占用的内存。删除pandas 数据框和其他变量，以释放它们占用的内存。通过循环调用 torch.cuda.empty_cache()和 gc.collect()函数来清理 GPU 内存和 Python 垃圾回收，以确保释放的内存被彻底清理。

（6）执行下面的系统命令，用于在 NVIDIA GPU 系统上查看当前 GPU 的状态和使用情况。

!nvidia-smi

执行后会显示当前系统中所有 NVIDIA GPU 的详细信息，包括 GPU 的型号、显存使用情况、温度、驱动程序版本等信息。

Sat Mar 23 22:52:36 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P100-PCIE-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P0              41W / 250W |   1926MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

（7）加载微调后的模型，并通过 PEFT 库中的类AutoPeftModelForCausalLM实现自动加载和优化操作。然后，将加载的模型合并并保存到新的目录中，并保存了相应的分词器，以备后续使用。

from peft import AutoPeftModelForCausalLM

finetuned_model = "./trained_weigths/"
compute_dtype = getattr(torch, "float16")
tokenizer = AutoTokenizer.from_pretrained("/kaggle/input/llama-2/pytorch/7b-hf/1")

model = AutoPeftModelForCausalLM.from_pretrained(
     finetuned_model,
     torch_dtype=compute_dtype,
     return_dict=False,
     low_cpu_mem_usage=True,
     device_map=device,
)

merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model",safe_serialization=True, max_shard_size="2GB")
tokenizer.save_pretrained("./merged_model")

执行后会输出：

Loading checkpoint shards: 100%2/2 [00:05<00:00, 2.58s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
('./merged_model/tokenizer_config.json',
 './merged_model/special_tokens_map.json',
 './merged_model/tokenizer.model',
 './merged_model/added_tokens.json',
 './merged_model/tokenizer.json')

5.5.7 模型测试

（1）调用函数predict()对测试集进行预测，并使用函数evaluate()评估模型在测试集上的性能。函数predict()使用了合并后的模型 merged_model 和分词器 tokenizer 对测试集进行预测，并返回了预测结果 y_pred。然后，函数evaluate()利用真实标签 y_true 和预测标签 y_pred 对模型的性能进行评估，包括计算准确率、精确度、召回率等指标。

y_pred = predict(test, merged_model, tokenizer)
evaluate(y_true, y_pred)

执行后会输出：

100%|██████████| 900/900 [03:51<00:00,  3.89it/s]
Accuracy: 0.847
Accuracy for label 0: 0.890
Accuracy for label 1: 0.870
Accuracy for label 2: 0.780

Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.89      0.92       300
           1       0.73      0.87      0.79       300
           2       0.88      0.78      0.83       300

    accuracy                           0.85       900
   macro avg       0.86      0.85      0.85       900
weighted avg       0.86      0.85      0.85       900


Confusion Matrix:
[[267  31   2]
 [ 10 261  29]
 [  1  65 234]]

（2）下面的这段代码将测试集中的文本、真实标签和预测标签组合成一个 DataFrame，并将其保存为 CSV 文件test_predictions.csv，以便后续分析和比较模型的预测结果。DataFrame 包括三列：'text' 列包含测试集中的文本，'y_true' 列包含真实的情感标签，'y_pred' 列包含模型预测的情感标签。CSV 文件中不包含行索引。

evaluation = pd.DataFrame({'text': X_test["text"],
                           'y_true':y_true,
                           'y_pred': y_pred},
                         )
evaluation.to_csv("test_predictions.csv", index=False)

执行后可以比较微调后模型和基准模型（一个基于 CONV1D + 双向 LSTM 的模型）的评估结果，以确定微调后模型是否比基准模型更优秀。

Accuracy: 0.623 Accuracy for label 0: 0.620 Accuracy for label 1: 0.590 Accuracy for label 2: 0.660

Classification Report: precision recall f1-score support

       0       0.79      0.62      0.69       300
       1       0.61      0.59      0.60       300
       2       0.53      0.66      0.59       300

accuracy                           0.62       900
macro avg 0.64 0.62 0.63 900 weighted avg 0.64 0.62 0.63 900

Confusion Matrix:

[[186 39 75]\ [ 23 177 100]\ [ 27 75 198]]

本项目已完结：

（5-5-01）金融市场情绪分析：使用Llama 2 大模型实现财经信息的情感分析微调（1）-CSDN博客