vox box下载 - vox box源代码下载

vox box

Ai源码

v0.0.9

下载

Vox盒

文本到语音和语音对文本服务器与OpenAI API兼容，由Whisper，Funasr，Bark和Cosyvoice提供支持。

要求

Python 3.10或更大
支持NVIDIA GPU，需要安装以下NVIDIA库：
- Cublas for Cuda 12
- Cudnn 9 for Cuda 12

安装

您可以使用PIP安装项目：

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH= $( brew --prefix openfst ) /include
export LIBRARY_PATH= $( brew --prefix openfst ) /lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

用法

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C: U sers m ichelia A ppData R oaming v ox-box --host 0.0.0.0 --port 8082

选项

-d， - debug：启用调试模式。
- 主持人：主机将服务器绑定到。默认值为0.0.0.0。
- 端口：将服务器绑定到的端口。默认值为80。
- 模型：模型路径。
- 设备：绑定设备，例如，cuda：0。默认值为CPU。
- 追溯式repo-id：模型的huggingface repo ID。
- 模型 - 模型模型-ID：模型的模型范围模型ID。
-DATA-DIR：存储下载的模型数据的目录。默认值为OS。

支持的模型

模型	类型	关联	经过验证的平台
更快的旋风大v3	语音到文本	拥抱的脸，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋转大量V2	语音到文本	拥抱的脸，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋风大v1	语音到文本	拥抱的脸，模特尺寸
更快的中等	语音到文本	拥抱的脸，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的medper.en	语音到文本	拥抱的脸，模特尺寸
更快的旋风小all	语音到文本	拥抱的脸，模特尺寸	Linux✅，Windows✅，MacOS✅
更快的旋风small.en	语音到文本	拥抱的脸，模特尺寸
更快的distil-whisper-large-v3	语音到文本	拥抱的脸，模特尺寸	macos✅
更快的distil-whisper-large-v2	语音到文本	拥抱的脸，模特尺寸	macos✅
更快的distil-whisper-medium.en	语音到文本	拥抱的脸，模特尺寸
更快的旋风	语音到文本	拥抱的脸，模特尺寸
更快的旋风小	语音到文本	拥抱的脸，模特尺寸
Paraformer-ZH	语音到文本	拥抱的脸，模特尺寸
paraformer-zh-streaming	语音到文本	拥抱的脸，模特尺寸	Linux✅，MacOS✅
paraformer-en	语音到文本	拥抱的脸，模特尺寸
构象-en	语音到文本	拥抱的脸，模特尺寸
Sensevoicesmall	语音到文本	拥抱的脸，模特尺寸	Linux✅，Windows✅，MacOS✅
吠	文本到语音	拥抱脸
树皮小all	文本到语音	拥抱脸
cosyvoice-300m教学	文本到语音	拥抱的脸，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
cosyvoice-300m-sft	文本到语音	拥抱的脸，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
Cosyvoice-300m	文本到语音	拥抱的脸，模特尺寸	Linux（不支持ARM），Windows（不支持），MacOS✅
cosyvoice-300m-25Hz	文本到语音	ModelsCope	Linux（不支持ARM），Windows（不支持），MacOS✅

支持的API

创建演讲

端点： POST /v1/audio/speech

从输入文本生成音频。与OpenAI音频/语音API兼容。

示例请求：

curl http://localhost/v1/audio/speech 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: application/json " 
  -d ' {
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  } ' 
  --output speech.mp3

响应：音频文件内容。

创建转录

端点： POST /v1/audio/transcriptions

将音频转录为输入语言。与OpenAI音频/转录API兼容。

示例请求：

curl https://localhost/v1/audio/transcriptions 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: multipart/form-data " 
  -F file= " @/path/to/file/audio.mp3 " 
  -F model= " whisper-large-v3 "

回复：