PythonとWhisper APIで作る超簡単音声文字起こしアプリ

概要

みなさん、こんにちは！今回は、音声を自動でテキストに変換する「音声文字起こし」アプリの作り方をご紹介します。難しそうに聞こえるかもしれませんが、OpenAIのWhisper APIを使えば、驚くほど簡単に高精度な文字起こしアプリが作れるんです。

このブログでは、プログラミング初心者の方でも理解できるよう、ステップバイステップで解説していきます。さあ、一緒に学んでいきましょう！

こんな時に使えます！音声文字起こしアプリの活用シーン

学生さんの味方：講義の録音を文字に起こして復習材料に
ビジネスパーソンの強い味方：会議の議事録作成が驚くほど簡単に
ジャーナリストの必須ツール：インタビューの書き起こしが瞬時に
クリエイターの作業効率アップ：ポッドキャストの内容をブログ記事に転用
研究者の心強い助手：フィールドワークでの音声メモを簡単テキスト化

インストール方法：準備は簡単3ステップ！

Pythonをインストール：
- Python公式サイトからダウンロード（バージョン3.6以上がおすすめ）
- インストーラーの指示に従ってインストール
必要なライブラリをインストール：
- コマンドプロンプト（Windowsの場合）またはターミナル（MacやLinuxの場合）を開く
- 以下のコマンドを入力してEnterキーを押す：
  pip install openai tkinterdnd2
OpenAIのアカウントを作成：
- OpenAIのウェブサイトにアクセス
- 「Sign Up」からアカウントを作成
- APIキーを取得（詳細は後述）

重要！APIキーの設定方法

OpenAIのAPIキーは、アプリケーションがWhisper APIを使用するための「鍵」のようなものです。以下の手順で設定しましょう：

OpenAIのダッシュボードにログイン
「API keys」セクションを見つけ、新しいキーを生成
生成されたキーをコピー
プログラムコード内の 'YOUR_API_KEY_HERE' という部分を、コピーしたキーに置き換える

注意: APIキーは絶対に他人に教えたり、公開したりしないでください！

使用手順：たった5ステップで文字起こし完了！

作成したプログラムを実行（ダブルクリックするだけ）
表示されたウィンドウに、文字起こししたい音声ファイル（MP3、WAV、M4A、MP4）をドラッグ＆ドロップ
「Status: Uploading...」の表示を確認
「Status: Transcribing...」の表示を確認（この間、お茶でも飲んで待ちましょう）
「Status: Completed」が表示されたら完了！テキストファイルの保存場所が表示されます

初心者さんも要チェック！注意点5箇条

APIキーは大切に：パスワードと同じくらい重要です。絶対に公開しないでください。
処理時間にご注意：長い音声ファイルほど、変換に時間がかかります。
コストを確認：アプリには推定コストが表示されます。高額請求を避けるためにも、常にチェックしましょう。料金は1分ごとに0.006ドルで、日本円に換算すると1時間利用した場合約50〜60円です。(2023年8月現在)
インターネット接続必須：オフラインでは動作しません。安定した接続を確保してください。
個人情報に注意：機密性の高い音声を扱う際は、特に慎重に。

プログラム

import tkinter as tk
from tkinterdnd2 import DND_FILES, TkinterDnD
from tkinter import filedialog, messagebox
import os
import traceback
from openai import OpenAI
import json
import wave
import contextlib

class TranscriptionApp:
    def __init__(self, master):
        self.master = master
        master.title("Whisper Transcription App")
        master.geometry("400x400")

        self.label = tk.Label(master, text="Drag and drop an audio file here")
        self.label.pack(pady=10)

        self.status_label = tk.Label(master, text="Status: Ready")
        self.status_label.pack(pady=5)

        self.cost_label = tk.Label(master, text="Estimated cost: $0.00")
        self.cost_label.pack(pady=5)

        self.text_area = tk.Text(master, height=10, width=50)
        self.text_area.pack(pady=10)

        self.master.drop_target_register(DND_FILES)
        self.master.dnd_bind('<<Drop>>', self.drop)

        # OpenAI クライアントの初期化
        self.client = OpenAI(api_key='YOUR_API_KEY_HERE')  # ここに自分のAPIキーを設定してください

    def update_status(self, status):
        self.status_label.config(text=f"Status: {status}")
        self.master.update_idletasks()

    def update_cost(self, duration):
        # Whisper APIの価格: $0.006 / 分 (2023年8月現在)
        cost = (duration / 60) * 0.006
        self.cost_label.config(text=f"Estimated cost: ${cost:.4f}")

    def get_audio_duration(self, file_path):
        if file_path.lower().endswith('.wav'):
            with contextlib.closing(wave.open(file_path,'r')) as f:
                frames = f.getnframes()
                rate = f.getframerate()
                duration = frames / float(rate)
        else:
            # 他の形式の場合は概算（30秒とする）
            duration = 30
        return duration

    def drop(self, event):
        file_path = event.data
        file_path = file_path.strip('"')
        print(f"Dropped file: {file_path}")
        
        allowed_extensions = ['.mp3', '.wav', '.m4a', '.mp4']
        if any(file_path.lower().endswith(ext) for ext in allowed_extensions):
            self.transcribe_audio(file_path)
        else:
            print(f"File extension: {os.path.splitext(file_path)[1]}")
            messagebox.showerror("Error", f"Please drop an audio file. Received file: {file_path}")

    def transcribe_audio(self, file_path):
        try:
            self.update_status("Uploading...")
            print(f"Attempting to transcribe: {file_path}")
            
            duration = self.get_audio_duration(file_path)
            self.update_cost(duration)

            with open(file_path, "rb") as audio_file:
                self.update_status("Transcribing...")
                transcript = self.client.audio.transcriptions.create(
                    model="whisper-1", 
                    file=audio_file
                )

            transcription = transcript.text
            
            output_file = os.path.splitext(file_path)[0] + "_transcription.txt"
            with open(output_file, "w", encoding="utf-8") as f:
                f.write(transcription)

            self.text_area.delete(1.0, tk.END)
            self.text_area.insert(tk.END, f"Transcription saved to:\n{output_file}")
            self.update_status("Completed")

        except Exception as e:
            error_msg = f"Error during transcription:\n{str(e)}\n\nTraceback:\n{traceback.format_exc()}"
            print(error_msg)
            messagebox.showerror("Error", error_msg)
            self.update_status("Error occurred")

if __name__ == "__main__":
    try:
        root = TkinterDnD.Tk()
        app = TranscriptionApp(root)
        root.mainloop()
    except Exception as e:
        print(f"An error occurred: {e}")
        print("Please make sure tkinterdnd2 is installed correctly.")
        print("You can install it using: pip install tkinterdnd2")

あるいは、下のテキストファイルをダウンロードし、「.txt」を「.py」に変えることでそのまま使えます。