vox box Download - vox box -Quellcode -Download

vox box

AI-Quellcode

v0.0.9

Herunterladen

Vox -Box

Ein mit der OpenAI-API kompatibeler Text-zu-Sprach- und Sprach-zu-Text-Server, der durch Backend-Unterstützung von Whisper, Funasr, Rinde und CoSyvoice betrieben wird.

Anforderungen

Python 3.10 oder mehr
Unterstützen Sie die NVIDIA -GPU, erfordert die Installation der folgenden NVIDIA -Bibliotheken:
- Kublas für CUDA 12
- Cudnn 9 für Cuda 12

Installation

Sie können das Projekt mit PIP installieren:

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH= $( brew --prefix openfst ) /include
export LIBRARY_PATH= $( brew --prefix openfst ) /lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

Verwendung

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C: U sers m ichelia A ppData R oaming v ox-box --host 0.0.0.0 --port 8082

Optionen

-d, - -debug: Debug -Modus aktivieren.
-Host: Host, um den Server zu binden. Standard ist 0.0.0.0.
-Port: Port, um den Server an zu binden. Standard ist 80.
-Modell: Modellpfad.
-Geräte: Bindungsvorrichtung, z. B. CUDA: 0. Standard ist CPU.
-Huggingface-Repo-ID: Huggingface Repo-ID für das Modell.
-Model-SCOPE-MODEL-ID: Modellumfangsmodell-ID für das Modell.
-Data-Dir: Verzeichnis zum Speichern von heruntergeladenen Modelldaten. Standard ist osspezifisch.

Unterstützte Modelle

Modell	Typ	Link	Verifizierte Plattformen
Schneller-whisper-large-v3	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, Windows ✅, macOS ✅
Schneller-whisper-large-v2	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, Windows ✅, macOS ✅
Schneller-whisper-large-v1	Sprache zu Text	Umarmt Gesicht, ModelsCope
Schneller-Whisper-Medium	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, Windows ✅, macOS ✅
Schneller-Whisper-Medium	Sprache zu Text	Umarmt Gesicht, ModelsCope
Schneller-Whisper-Small	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, Windows ✅, macOS ✅
Schneller-whisper-small.en	Sprache zu Text	Umarmt Gesicht, ModelsCope
Schneller distil-wisper-large-v3	Sprache zu Text	Umarmt Gesicht, ModelsCope	Macos ✅
Schneller distil-wisper-large-v2	Sprache zu Text	Umarmt Gesicht, ModelsCope	Macos ✅
Schneller distil-wisper-medium.en	Sprache zu Text	Umarmt Gesicht, ModelsCope
Schneller-Whisper-Tiny	Sprache zu Text	Umarmt Gesicht, ModelsCope
Schneller-Whisper-tiny.en	Sprache zu Text	Umarmt Gesicht, ModelsCope
Paraformer-Zh	Sprache zu Text	Umarmt Gesicht, ModelsCope
Paraformer-Zh-Streaming	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, macOS ✅
Paraformer-en	Sprache zu Text	Umarmt Gesicht, ModelsCope
Konformer-en	Sprache zu Text	Umarmt Gesicht, ModelsCope
SenseVoicesmall	Sprache zu Text	Umarmt Gesicht, ModelsCope	Linux ✅, Windows ✅, macOS ✅
Bellen	Text-to-Speech	Umarmtes Gesicht
Rinde-Small	Text-to-Speech	Umarmtes Gesicht
Cosyvoice-300m-struktur	Text-to-Speech	Umarmt Gesicht, ModelsCope	Linux (Arm nicht unterstützt), Windows (nicht unterstützt), macOS ✅
Cosyvoice-300m-sft	Text-to-Speech	Umarmt Gesicht, ModelsCope	Linux (Arm nicht unterstützt), Windows (nicht unterstützt), macOS ✅
Cosyvoice-300m	Text-to-Speech	Umarmt Gesicht, ModelsCope	Linux (Arm nicht unterstützt), Windows (nicht unterstützt), macOS ✅
Cosyvoice-300m-25hz	Text-to-Speech	ModelsCope	Linux (Arm nicht unterstützt), Windows (nicht unterstützt), macOS ✅

Unterstützte APIs

Sprache erstellen

Endpunkt : POST /v1/audio/speech

Generiert Audio aus dem Eingabtext. Kompatibel mit der OpenAI -Audio-/Sprach -API.

Beispielanforderung :

curl http://localhost/v1/audio/speech 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: application/json " 
  -d ' {
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  } ' 
  --output speech.mp3

Antwort : Der Inhalt der Audiodatei.

Transkription erstellen

Endpunkt : POST /v1/audio/transcriptions

Transkribiert Audio in die Eingabessprache. Kompatibel mit der OpenAI -Audio-/Transkriptions -API.

Beispielanforderung :

curl https://localhost/v1/audio/transcriptions 
  -H " Authorization: Bearer $OPENAI_API_KEY " 
  -H " Content-Type: multipart/form-data " 
  -F file= " @/path/to/file/audio.mp3 " 
  -F model= " whisper-large-v3 "

Antwort :