【whisper】在python中调用whisper提取字幕或翻译字幕到文本

最近在做视频处理相关的业务。其中有需要将视频提取字幕的需求，在我们实现过程中分为两步：先将音频分离，然后就用到了whisper来进行语音识别或者翻译。本文将详细介绍一下whisper的基本使用以及在python中调用whisper的两种方式。

一、whisper简介

whisper 是一款用于语音识别的开源库，支持多种语言，其中包括中文。在本篇文章中，我们将介绍如何安装 whisper 以及如何使用它来识别中文字幕。

二、安装 whisper

首先，我们需要安装 whisper。根据操作系统，可以按照以下步骤进行安装：

对于 Windows 用户，可以从 whisper 的 GitHub 页面 (https://github.com/qingzhao/whisper) 下载适用的 Python 版本的whisper 安装包，然后运行安装程序。

对于 macOS 用户，可以使用 Homebrew (https://brew.sh/) 进行安装。在终端中运行以下命令：brew install python@3.10 whisper。

对于 Linux 用户，可以使用包管理器 (如 apt 或 yum) 进行安装。例如，对于使用 Python 3.10 的 Ubuntu 用户，在终端中运行以下命令：sudo apt install python3.10 whisper。

当然，我们还需要配置环境，这里我们可以参考这篇文章，这篇文章是使用控制台的方式来进行字幕翻译，比较适合非开发人员。

三、使用Whisper提取视频字幕并生成文件

3.1 安装Whisper库

首先，我们需要安装Whisper库。可以使用以下命令在命令行中安装：

pip install whisper

3.2 导入所需的库和模块

import whisper
import arrow
import time
from datetime import datetime, timedelta
import subprocess
import re
import datetime

参考 python生成requirements.txt的两种方法
生成失败参考这里
对应版本生成的requirements.txt信息

arrow==1.3.0
asposestorage==1.0.2
numpy==1.25.0
openai_whisper==20230918

3.3 提取字幕并生成文件

下面是一个函数，用于从目标视频中提取字幕并生成到指定文件：

1.在python中直接调库的方式

def extract_subtitles(video_file, output_file, actual_start_time=None):
    # 加载whisper模型
    model = whisper.load_model("medium")  # 根据需要选择合适的模型
    subtitles = []
    # 提取字幕
    result = model.transcribe(video_file)
    start_time = arrow.get(actual_start_time, 'HH:mm:ss.SSS') if actual_start_time is not None else arrow.get(0)

    for segment in result["segments"]:
        # 计算开始时间和结束时间
        start = format_time(start_time.shift(seconds=segment["start"]))
        end = format_time(start_time.shift(seconds=segment["end"]))
        # 构建字幕文本
        subtitle_text = f"【{
     start}