vox box Download - vox box Source code download

vox box

AI Source Code

v0.0.9

Download

Vox Box

A text-to-speech and speech-to-text server compatible with the OpenAI API, powered by backend support from Whisper, FunASR, Bark, and CosyVoice.

Requirements

Python 3.10 or greater
Support Nvidia GPU, requires the following NVIDIA libraries to be installed:
- cuBLAS for CUDA 12
- cuDNN 9 for CUDA 12

Installation

You can install the project using pip:

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH=$(brew --prefix openfst)/include
export LIBRARY_PATH=$(brew --prefix openfst)/lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

Usage

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C:UsersmicheliaAppDataRoamingvox-box --host 0.0.0.0 --port 8082

Options

-d, --debug: Enable debug mode.
--host: Host to bind the server to. Default is 0.0.0.0.
--port: Port to bind the server to. Default is 80.
--model: model path.
--device: Binding device, e.g., cuda:0. Default is cpu.
--huggingface-repo-id: Huggingface repo id for the model.
--model-scope-model-id: Model scope model id for the model.
--data-dir: Directory to store downloaded model data. Default is OS specific.

Supported Models

Model	Type	Link	Verified Platforms
Faster-whisper-large-v3	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v2	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v1	speech-to-text	Hugging Face, ModelScope
Faster-whisper-medium	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-medium.en	speech-to-text	Hugging Face, ModelScope
Faster-whisper-small	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-small.en	speech-to-text	Hugging Face, ModelScope
Faster-distil-whisper-large-v3	speech-to-text	Hugging Face, ModelScope	MacOS ✅
Faster-distil-whisper-large-v2	speech-to-text	Hugging Face, ModelScope	MacOS ✅
Faster-distil-whisper-medium.en	speech-to-text	Hugging Face, ModelScope
Faster-whisper-tiny	speech-to-text	Hugging Face, ModelScope
Faster-whisper-tiny.en	speech-to-text	Hugging Face, ModelScope
Paraformer-zh	speech-to-text	Hugging Face, ModelScope
Paraformer-zh-streaming	speech-to-text	Hugging Face, ModelScope	Linux ✅, MacOS ✅
Paraformer-en	speech-to-text	Hugging Face, ModelScope
Conformer-en	speech-to-text	Hugging Face, Modelscope
SenseVoiceSmall	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Bark	text-to-speech	Hugging Face
Bark-small	text-to-speech	Hugging Face
CosyVoice-300M-Instruct	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported), Windows(Not supported), macOS ✅
CosyVoice-300M-SFT	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported), Windows(Not supported), macOS ✅
CosyVoice-300M	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported), Windows(Not supported), macOS ✅
CosyVoice-300M-25Hz	text-to-speech	ModelScope	Linux(ARM not supported), Windows(Not supported), macOS ✅

Supported APIs

Create speech

Endpoint: POST /v1/audio/speech

Generates audio from the input text. Compatible with the OpenAI audio/speech API.

Example Request:

curl http://localhost/v1/audio/speech 
  -H "Authorization: Bearer $OPENAI_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  }' 
  --output speech.mp3

Response: The audio file content.

Create transcription

Endpoint: POST /v1/audio/transcriptions

Transcribes audio into the input language. Compatible with the OpenAI audio/transcription API.

Example Request:

curl https://localhost/v1/audio/transcriptions 
  -H "Authorization: Bearer $OPENAI_API_KEY" 
  -H "Content-Type: multipart/form-data" 
  -F file="@/path/to/file/audio.mp3" 
  -F model="whisper-large-v3"

Response: