search_transcriptsダウンロードsearch_transcriptsソースコードのダウンロード

search_transcripts

その他のソースコード

1.0.0

ダウンロード

トランスクリプト検索

このコードは、多数のOpenaiのささやき声を簡単に検索できるように設計されています。したがって、特定の通過または用語が発生することを見つけることができます。ただし、.vttファイルのフォルダーで動作するはずです。ポッドキャストの非Openaiトランスクリプト。

Whisper Openaiを使用して偶発的なTech Podcastを転写し、このモジュール（特にSearchTranscripts ）を搭載したライブ検索エンジンWebサイトのフロントエンドをここに展開しました。

このモジュールには2つのクラスがあります。

LoadTranscripts ：これにより、トランスクリプトファイル（ .vttまたは.jsonファイル）のフォルダーからSQLiteデータベースとFTS5仮想テーブルが作成されます。テキストブロックを検索可能にするために、元のファイルの短い転写セグメントから長いテキスト（それぞれ約300ワード）を作成します。個別の転写セグメントを別のデータベースに保存します。
SearchTranscripts ：これは、SQLiteデータベースを使用して、検索クエリの上位結果のPANDASデータフレームを返すPythonクラスです。

sqliteデータベースがLoadTranscriptsで作成されたら、データセット、dbeaver、コマンドライン、sql錬金術など、好きなSQLiteインターフェイスを介してそのデータベースにアクセスできます。SearchTranscriptsクラスは、 SearchTranscriptsモジュールとパンダとパンダの構築を使用して、PITHONからのデータにアクセスするためのシンプルで便利な方法です。

インストール：

リポジトリのメインディレクトリにクローンとCDを作成し、次に実行します。

 pip install .

使用法：


from search_transcripts import LoadTranscripts, SearchTranscripts

l = LoadTranscripts('transcripts') ## will create main.db and bm25.pickle


s = SearchTranscripts()

## Returns a pandas dataframe of the top scoring transcript sections, across all transcripts.

s.search('starship enterprise')

##find the exact phrase

s.search('"starship enterprise"')

JSONトランスクリプト？

そのため、Whisperが標準の.VTTファイルを作成することに気付く前に、Python APIを直接使用していました。 Python辞書のリストを生成します。 JSONが当時論理的に思えたようにそれを保存しました。 JSONは.VTTよりもはるかに簡単に機械的に読みやすく、VTTに簡単に変換できるため、このやや風変わりな形式をサポートしています。そう見えます：

    [
           {
        "start": 606.1800000000001,
        "end": 610.74,
        "text": " It's important to have a goal to work toward and accomplish rather than just randomly learning and half building things"
    },
    {
        "start": 610.74,
        "end": 613.0600000000001,
        "text": " Having a specific thing you want to build is a good substitute"
    },
    {
        "start": 613.38,
        "end": 619.78,
        "text": " Keep making things until you've made something you're proud enough a proud of enough to show off in an interview by the time you've built a few"
    },
    {
        "start": 619.78,
        "end": 624.26,
        "text": " Things you'll start developing the taste you need to make that determination of what's quote unquote good enough"
    },


    ]

拡大する

追加情報