LLaMA-Factory使用V100微调ChatGLM2报错 RuntimeError: “addmm_impl_cpu_“ not implemented for ‘Half‘

微调命令

CUDA_VISIBLE_DEVICES=0 python /aaa/LLaMA-Factory/src/train_bash.py \
    --stage sft \
    --model_name_or_path /aaa/LLaMA-Factory/models/chatglm2-6b \
    --do_train \
    --dataset bbbccc \
    --template chatglm2 \
    --finetuning_type lora \
    --lora_target query_key_value \
    --output_dir output/dddeee/ \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 10 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss

已经从huggingface下载完整的模型并配置正确路径，也对自定义数据集仿照alpaca_gpt4_data_zh.json在dataset_info.json中写入相关配置。但运行如上命令还是有报错如下：


[INFO|training_args.py:1798] 2023-11-02 16:00:19,165 >> PyTorch: setting up devices
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:1760] 2023-11-02 16:00:19,402 >> ***** Running training *****
[INFO|trainer.py:1761] 2023-11-02 16:00:19,402 >>   Num examples = 1,372
[INFO|trainer.py:1762] 2023-11-02 16:00:19,402 >>   Num Epochs = 3
[INFO|trainer.py:1763] 2023-11-02 16:00:19,402 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1766] 2023-11-02 16:00:19,402 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1767] 2023-11-02 16:00:19,403 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1768] 2023-11-02 16:00:19,403 >>   Total optimization steps = 255
[INFO|trainer.py:1769] 2023-11-02 16:00:19,404 >>   Number of trainable parameters = 1,949,696
  0%|                                                                                                                       | 0/255 [00:00<?, ?it/s]/aaa/envs/bbb_llama_factory_py310/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
Traceback (most recent call last):
  File "/aaa/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/aaa/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/aaa/LLaMA-Factory/src/llmtuner/tuner/tune.py", line 26, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/aaa/LLaMA-Factory/src/llmtuner/tuner/sft/workflow.py", line 67, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
    return inner_training_loop(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in training_step
    loss = self.compute_loss(model, inputs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/transformers/trainer.py", line 2801, in compute_loss
    outputs = model(**inputs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forward
    return self.base_model(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
    return self.model.forward(*args, **kwargs)
  File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 937, in forward
    transformer_outputs = self.transformer(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 830, in forward
    hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 631, in forward
    layer_ret = torch.utils.checkpoint.checkpoint(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 544, in forward
    attention_output, kv_cache = self.self_attention(
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 376, in forward
    mixed_x_layer = self.query_key_value(hidden_states)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/envs/llama_factory_py310/lib/python3.10/site-packages/peft/tuners/lora.py", line 902, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
  0%|                                                                                  | 0/255 [00:00<?, ?it/s]

命令运行过程中，看上去已经成功加载模型了，应该是训练第1个epoch时的报错。我--fp16加到上面的命令中运行，也有报错。

这是与开源社区交流的记录： https://github.com/hiyouga/LLaMA-Factory/issues/1359

原因：cuda 环境问题解决方案：pip install torch==2.0.1 排查：打log看torch.cuda.is_available()输出为False说明CUDA环境有问题