search_transcripts下載search_transcripts源代碼下載

search_transcripts

其他源碼

1.0.0

下載

成績單搜索

該代碼旨在使大量OpenAI的耳語成績單容易搜索。因此，人們可以找到特定的段落或期限發生，並且在成績單中什麼時候。但是，它應該與.vtt文件的任何文件夾一起使用：播客的非OpenAI成績單。

我使用Whisper Openai轉錄偶然的技術播客，並在此處部署了一個實時搜索引擎網站前端，由該模塊提供支持（特別是SearchTranscripts ）。

該模塊有兩個類：

LoadTranscripts ：從轉錄文件文件的文件夾（ .vtt或.json文件）創建一個SQLITE數據庫和FTS5虛擬表。它從原始文件中的短成績單段中創建了更長的文本（每個單詞約300個單詞），以使文本塊可搜索。它將單個成績單段保存在單獨的數據庫中。
SearchTranscripts ：這是一個使用SQLite數據庫返回搜索查詢最佳結果的PANDAS數據框架的Python類。

一旦使用LoadTranscripts創建了SQLITE數據庫，您就可以通過您喜歡的任何SQLITE接口訪問該數據庫，例如DataSette，dbeaver，dbeaver，命令行，SQL Alchemy等。 SearchTranscripts類是一種簡單便捷的方式，可以從Python中使用python，使用python，使用內置的In In In In In Ins Sqlite3 Module和P.Sqlite 3 Module和P.Sqlite和P. In In In In Is In Is In Is In Is It In Is It In Is It In It。

安裝：

克隆和CD進入倉庫的主要目錄，然後運行：

 pip install .

用法：


from search_transcripts import LoadTranscripts, SearchTranscripts

l = LoadTranscripts('transcripts') ## will create main.db and bm25.pickle


s = SearchTranscripts()

## Returns a pandas dataframe of the top scoring transcript sections, across all transcripts.

s.search('starship enterprise')

##find the exact phrase

s.search('"starship enterprise"')

JSON成績單？

因此，在我意識到Whisper會創建標準.VTT文件之前，我直接使用Python API。它生成了Python詞典列表。將其保存為當時的JSON似乎是合乎邏輯的。我發現JSON比.vtt更容易讀取，並且可以輕鬆地轉換為VTT，因此我仍然支持這種有些古怪的格式。看起來如此：

    [
           {
        "start": 606.1800000000001,
        "end": 610.74,
        "text": " It's important to have a goal to work toward and accomplish rather than just randomly learning and half building things"
    },
    {
        "start": 610.74,
        "end": 613.0600000000001,
        "text": " Having a specific thing you want to build is a good substitute"
    },
    {
        "start": 613.38,
        "end": 619.78,
        "text": " Keep making things until you've made something you're proud enough a proud of enough to show off in an interview by the time you've built a few"
    },
    {
        "start": 619.78,
        "end": 624.26,
        "text": " Things you'll start developing the taste you need to make that determination of what's quote unquote good enough"
    },


    ]

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2025-03-13
大小 13.51KB
來自於 Github

相關應用

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
詞搜尋 800

2024-11-08
nextcloud_share_url_downloader

2024-11-01
麗華資料分析引擎免費版3.0_搜尋_導航_採集_輿情_排行_api

2022-06-28

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部