ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [34 lines of output]
      *** scikit-build-core 0.10.5 using CMake 3.30.2 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmp12mmpfoy/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.4.0
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Could NOT find Git (missing: GIT_EXECUTABLE)
      CMake Warning at vendor/llama.cpp/cmake/build-info.cmake:14 (message):
        Git not found.  Build info will not be accurate.
      Call Stack (most recent call first):
        vendor/llama.cpp/CMakeLists.txt:74 (include)
      
      
      CMake Error at vendor/llama.cpp/CMakeLists.txt:95 (message):
        LLAMA_CUBLAS is deprecated and will be removed in the future.
      
        Use GGML_CUDA instead
      
      Call Stack (most recent call first):
        vendor/llama.cpp/CMakeLists.txt:100 (llama_option_depr)
      
      
      -- Configuring incomplete, errors occurred!
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

在Ubuntu 22.04 上不是Xinferenc，安装时报错如上。

说明一下，这台机器是部署大模型的服务器，有两块英伟达4090显卡，基础环境以及安装了CUDA，Pytorch基础计算包。基础环境安装xinference是没有问题的。就是安装好的xinference包后与原来运行大模型的环境冲突，所有我安装了conda，用conda新创建了一个环境xin_env,用xin_env环境安装xinference时报这个错。

解决办法：

1、在xin_env环境上安装CUDA和Pytorch

总结

### 文章总结
在Ubuntu 22.04系统上使用conda新创建的`xin_env`环境安装`llama-cpp-python`时遭遇了构建错误。错误的根本原因在于CMake配置过程中遇到了两个问题：
1. **缺少Git可执行文件**：
- CMake在尝试配置时发现系统中未找到Git。由于`llama.cpp`项目的某些部分依赖于Git来获取构建信息，尽管这不是构建的直接错误源，但它可能影响到生成的二进制文件的版本准确性。
2. **已废弃的配置选项**：
- CMake配置过程中报告了一个错误，指出`LLAMA_CUBLAS`配置选项已被废弃，并建议在将来使用`GGML_CUDA`选项。这表明项目的CMake脚本需要更新以适应新的配置参数。
此外，尽管基础环境已安装了CUDA和PyTorch，但新的conda环境`xin_env`可能未包含这些依赖，而`llama-cpp-python`可能需要它们才能正确构建。
### 解决办法
针对上述问题，可以采取以下步骤来解决安装失败的问题：
1. **安装Git**：
- 在新创建的conda环境`xin_env`或系统级别安装Git。在Ubuntu上，可以通过运行`sudo apt-get update && sudo apt-get install git`来全局安装Git。如果希望在conda环境中隔离安装，可能需要使用conda或miniforge提供的包管理功能查找是否存在Git包。
2. **安装CUDA和PyTorch到xin_env**：
- 确保`xin_env`环境有正确的CUDA和PyTorch版本支持`llama-cpp-python`编译和运行。可以通过Conda来安装这些依赖，使用类似下面的命令：
```bash
# 首先激活conda环境
conda activate xin_env

# 安装CUDA支持的PyTorch（注意版本号要与CUDA版本匹配）
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
确保安装的CUDA版本与硬件和使用的PyTorch版本兼容。
3. **检查`llama-cpp-python`版本和依赖**：
- 验证你尝试安装的`llama-cpp-python`的版本的兼容性及其所需的依赖项。如果有更新版本的包可用，尝试升级。
- 确认`GGML_CUDA`配置的使用，或者查找是否有相应的分支、标签或修复已经解决了对`LLAMA_CUBLAS`的依赖。
4. **重试安装**：
- 在完成以上步骤并确认所有依赖都已正确安装后，重新尝试在`xin_env`环境中构建和安装`llama-cpp-python`。
通过上述步骤，你应该能够解决在安装`llama-cpp-python`时遇到的构建错误问题。