
Note
All models are from the repository: snakers4/silero-models
| Language | Model | Speakers |
|---|---|---|
| Russian | v4_ru | 5: aidar, baya, kseniya, xenia, eugene |
| Ukrainian | v4_ua | 1: mykyta |
| Uzbek | v4_uz | 1: dilnavoz |
| English | v3_en | 118: en_0, en_1, ..., en_117 |
| Spanish | v3_es | 3: es_0, es_1, es_2 |
| French | v3_fr | 6: fr_0, fr_1, fr_2, fr_3, fr_4, fr_5 |
| German | v3_de | 5: bernd_ungerer, eva_k, friedrich, hokuspokus, karlsson |
| Tatar | v3_tt | 1: dilyara |
| Mongolian | v3_xal | 2: erdni, delghir |
Important
This requires docker installed and the docker daemon running
docker run --rm -p 8000:8000 twirapp/silero-tts-api-serverClone the repository:
git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-serverBuild docker image:
docker build -f docker/Dockerfile -t silero-tts-api-server .Run the container:
docker run --rm -p 8000:8000 silero-tts-api-serverOr use docker compose:
docker-compose -f docker/compose.yml upImportant
Minimum requirement python 3.9
This project uses rye for dependency management, it assumes you have installed it
Clone the repository
git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-serverInstall dependencies
This will automatically create the virtual environment in the .venv directory and install the required dependencies
rye syncpython3 -m venv .venv && source .venv/bin/activateInstall only the required dependencies:
pip3 install --no-deps -r requirements.lockDownload silero tts models
bash ./install_models.shRun the server
litestar runNote
The default will be localhost:8000
You can view the automatically generated documentation based on OpenAPI at:
| Provider | Url |
|---|---|
| Swagger | https://localhost:8000/schema/ |
| ReDoc | https://localhost:8000/schema/redoc |
| Stoplight Elements | https://localhost:8000/schema/elements |
| RepiDoc | https://localhost:8000/schema/repidoc |
| OpenAPI schema yaml | https://localhost:8000/schema/openapi.yaml |
| OpenAPI schema json | https://localhost:8000/schema/openapi.json |
GET /generate - Generate audio in wav format from text. Parameters: text speaker sample_rate, pitch, rateGET /speakers - Get list of speakerssample_rate can be set from 8 000, 24 000, 48 000
pitch and rate can be set from 0 to 100
TEXT_LENGTH_LIMIT - Maximum length of the text to be processed. Default is 930 characters.MKL_NUM_THREADS - Number of threads to use for generating audio. Default number of threads: number of CPU cores.This repository is dedicated to twir.app and is designed to meet its requirements.
TwirApp needs to generate audio using the CPU. If support for other devices such as cuda or mps is needed, please open an issue.