当前位置:AIGC资讯 > AIGC > 正文

OpenAI API - 使用Whisper和GPT-4模型开发一个自动化会议记录生成器

前言

本文章结合官方教程给大家介绍如何利用OpenAI的Whisper和GPT-4模型来开发一个自动化会议记录生成器。这个应用程序可以转录会议音频

一 应用程序基本介绍
Whisper和GPT-4模型:Whisper是一个用于音频转录的模型,而GPT-4是一个用于自然语言处理的模型。在这个教程中,我们将结合这两个模型的功能。 自动化会议记录生成器:这个应用程序的主要功能是自动生成会议记录,从音频转录到总结讨论内容。 功能: 转录音频:将会议的音频内容转录成文字。 提供摘要:总结会议讨论的主要内容。 提取关键点和行动项:找出会议中的重要信息和需要执行的任务。 情感分析:分析会议内容的情感倾向。
二 学习前置条件

        本教程假设您具备基本的Python知识,并拥有一个OpenAI API密钥。您可以使用本教程提供的音频文件或您自己的音频文件。

此外,您需要安装python-docxOpenAI库。您可以创建一个新的Python环境,并使用以下命令安装所需的软件包:

# 创建一个新的Python环境(可选)
python -m venv myenv
source myenv/bin/activate  # 对于Windows系统,使用 myenv\Scripts\activate

# 安装所需的软件包
pip install python-docx openai
三 让我们开始构建吧

        转录会议音频的第一步是将会议的音频文件传递给我们的/v1/audio API。Whisper模型是驱动音频API的核心,它能够将口语转换为书面文本。首先,我们将不传递提示或温度(用于控制模型输出的可选参数),而是使用默认值。

from openai import OpenAI

# 设置OpenAI API密钥
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    # api_key="My API Key",
)
from docx import Document

# 音频文件路径
audio_file_path = 'path/to/your/audio/file.mp3'

# 打开音频文件并传递给API
def transcribe_audio(audio_file_path):
    with open(audio_file_path, 'rb') as audio_file:
        transcription = client.audio.transcriptions.create("whisper-1", audio_file)
    return transcription['text']

        在上面这个函数中,audio_file_path 是您要转录的音频文件的路径。该函数打开此文件并将其传递给Whisper ASR模型(whisper-1)进行转录。结果将以原始文本的形式返回。需要注意的是,client.audio.transcriptions.create 函数需要传递实际的音频文件,而不仅仅是本地或远程服务器上的文件路径。这意味着,如果您在一个可能不存储音频文件的服务器上运行此代码,您需要首先有一个预处理步骤,将音频文件下载到该设备上。

———————————————————————————————————————————

        用GPT-4总结和分析转录文本 获得转录文本后,我们将通过Chat Completions API将其传递给GPT-4。GPT-4是OpenAI最先进的大语言模型,我们将用它来生成摘要、提取关键点和行动项,并进行情感分析。

本教程为每个任务使用不同的函数。这不是最有效的方法——您可以将这些指令放入一个函数中,但分开处理可能会提高摘要质量。

为了分解这些任务,我们定义了meeting_minutes函数,该函数将作为此应用程序的主要函数:

def abstract_summary_extraction(transcription):
    # 生成摘要的逻辑
    pass

def key_points_extraction(transcription):
    # 提取关键点的逻辑
    pass

def action_item_extraction(transcription):
    # 识别行动项的逻辑
    pass

def sentiment_analysis(transcription):
    # 进行情感分析的逻辑
    pass

def meeting_minutes(transcription):
    abstract_summary = abstract_summary_extraction(transcription)
    key_points = key_points_extraction(transcription)
    action_items = action_item_extraction(transcription)
    sentiment = sentiment_analysis(transcription)
    return {
        'abstract_summary': abstract_summary,
        'key_points': key_points,
        'action_items': action_items,
        'sentiment': sentiment
    }

        在上面这个函数中,transcription 是我们从Whisper获得的文本。转录文本可以传递给其他四个函数,每个函数执行一个特定的任务:

abstract_summary_extraction生成会议摘要:

def abstract_summary_extraction(transcription):
    response = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "You are a highly skilled AI trained in language comprehension and summarization. I would like you to read the following text and summarize it into a concise abstract paragraph. Aim to retain the most important points, providing a coherent and readable summary that could help a person understand the main points of the discussion without needing to read the entire text. Please avoid unnecessary details or tangential points."
            },
            {
                "role": "user",
                "content": transcription
            }
        ]
    )
    return response.choices[0].message.content

key_points_extraction提取主要点:

def key_points_extraction(transcription):
    response = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "You are a proficient AI with a specialty in distilling information into key points. Based on the following text, identify and list the main points that were discussed or brought up. These should be the most important ideas, findings, or topics that are crucial to the essence of the discussion. Your goal is to provide a list that someone could read to quickly understand what was talked about."
            },
            {
                "role": "user",
                "content": transcription
            }
        ]
    )
    return response.choices[0].message.content

action_item_extraction识别行动项:

def action_item_extraction(transcription):
    response = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "You are an AI expert in analyzing conversations and extracting action items. Please review the text and identify any tasks, assignments, or actions that were agreed upon or mentioned as needing to be done. These could be tasks assigned to specific individuals, or general actions that the group has decided to take. Please list these action items clearly and concisely."
            },
            {
                "role": "user",
                "content": transcription
            }
        ]
    )
    return response.choices[0].message.content

sentiment_analysis进行情感分析:

def sentiment_analysis(transcription):
    response = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "As an AI with expertise in language and emotion analysis, your task is to analyze the sentiment of the following text. Please consider the overall tone of the discussion, the emotion conveyed by the language used, and the context in which words and phrases are used. Indicate whether the sentiment is generally positive, negative, or neutral, and provide brief explanations for your analysis where possible."
            },
            {
                "role": "user",
                "content": transcription
            }
        ]
    )
    return response.choices[0].message.content

如果您需要其他功能,也可以按照上面的框架添加。 

———————————————————————————————————————————

导出会议记录

        一旦我们生成了会议记录,将它们保存为易于分发的可读格式是有益的。常见的格式之一是Microsoft Word。python-docx库是一个用于创建Word文档的流行开源库。如果您希望构建一个端到端的会议记录应用程序,可以考虑将导出步骤删除,而是将摘要内嵌到电子邮件中作为后续跟进。

为了处理导出过程,定义一个save_as_docx函数,将原始文本转换为Word文档:

from docx import Document

def save_as_docx(minutes, filename):
    doc = Document()
    for key, value in minutes.items():
        # 将下划线替换为空格,并将每个单词的首字母大写,作为标题
        heading = ' '.join(word.capitalize() for word in key.split('_'))
        doc.add_heading(heading, level=1)
        doc.add_paragraph(value)
        # 在各部分之间添加换行
        doc.add_paragraph()
    doc.save(filename)

最后,您可以将所有部分整合在一起,从音频文件生成会议记录:

audio_file_path = "Earningscall.wav"
transcription = transcribe_audio(audio_file_path)
minutes = meeting_minutes(transcription)
print(minutes)

save_as_docx(minutes, 'meeting_minutes.docx')

这段代码将转录音频文件Earningscall.wav,生成会议记录,打印它们,然后将其保存为名为meeting_minutes.docx的Word文档。

总结:

        我想说这个示例的关键点其实在于 提示词, 正确的使用提示词,能让你从GPT获得更精确的更高质量的反馈,我建议大家可以多阅读 Prompt engineering - OpenAI API,谢谢大家!我们下期见。

更新时间 2024-06-01