vox box下載 - vox box源代碼下載

vox box

Ai源碼

v0.0.9

下載

Vox盒

文本到語音和語音對文本服務器與OpenAI API兼容，由Whisper，Funasr，Bark和Cosyvoice提供支持。

要求

Python 3.10或更大
支持NVIDIA GPU，需要安裝以下NVIDIA庫：
- Cublas for Cuda 12
- Cudnn 9 for Cuda 12

安裝

您可以使用PIP安裝項目：

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH= $( brew --prefix openfst ) /include
export LIBRARY_PATH= $( brew --prefix openfst ) /lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

用法

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C: U sers m ichelia A ppData R oaming v ox-box --host 0.0.0.0 --port 8082

選項

-d， - debug：啟用調試模式。
- 主持人：主機將服務器綁定到。默認值為0.0.0.0。
- 端口：將服務器綁定到的端口。默認值為80。
- 模型：模型路徑。
- 設備：綁定設備，例如，cuda：0。默認值為CPU。
- 追溯式repo-id：模型的huggingface repo ID。
- 模型 - 模型模型-ID：模型的模型範圍模型ID。
-DATA-DIR：存儲下載的模型數據的目錄。默認值為OS。

支持的模型

模型	類型	關聯	經過驗證的平台
更快的旋風大v3	語音到文本	擁抱的臉，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋轉大量V2	語音到文本	擁抱的臉，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋風大v1	語音到文本	擁抱的臉，模特尺寸
更快的中等	語音到文本	擁抱的臉，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的medper.en	語音到文本	擁抱的臉，模特尺寸
更快的旋風小all	語音到文本	擁抱的臉，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋風small.en	語音到文本	擁抱的臉，模特尺寸
更快的distil-whisper-large-v3	語音到文本	擁抱的臉，模特尺寸	macos✅
更快的distil-whisper-large-v2	語音到文本	擁抱的臉，模特尺寸	macos✅
更快的distil-whisper-medium.en	語音到文本	擁抱的臉，模特尺寸
更快的旋風	語音到文本	擁抱的臉，模特尺寸
更快的旋風小	語音到文本	擁抱的臉，模特尺寸
Paraformer-ZH	語音到文本	擁抱的臉，模特尺寸
paraformer-zh-streaming	語音到文本	擁抱的臉，模特尺寸	Linux✅，MacOS✅
paraformer-en	語音到文本	擁抱的臉，模特尺寸
構象-en	語音到文本	擁抱的臉，模特尺寸
Sensevoicesmall	語音到文本	擁抱的臉，模特尺寸	Linux✅，Windows✅，MacOS✅
吠	文本到語音	擁抱臉
樹皮小all	文本到語音	擁抱臉
cosyvoice-300m教學	文本到語音	擁抱的臉，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
cosyvoice-300m-sft	文本到語音	擁抱的臉，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
Cosyvoice-300m	文本到語音	擁抱的臉，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
cosyvoice-300m-25Hz	文本到語音	ModelsCope	Linux（不支持ARM），Windows（不支持），MacOS✅

支持的API

創建演講

端點： POST /v1/audio/speech

從輸入文本生成音頻。與OpenAI音頻/語音API兼容。

示例請求：

curl http://localhost/v1/audio/speech 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: application/json " 
  -d ' {
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  } ' 
  --output speech.mp3

響應：音頻文件內容。

創建轉錄

端點： POST /v1/audio/transcriptions

將音頻轉錄為輸入語言。與OpenAI音頻/轉錄API兼容。

示例請求：

curl https://localhost/v1/audio/transcriptions 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: multipart/form-data " 
  -F file= " @/path/to/file/audio.mp3 " 
  -F model= " whisper-large-v3 "

回覆: