xtts api server下载-XXTTS xtts api server源代码下载

xtts api server

Ai源码

0.9.0

下载

一个简单的fastapi服务器，用于运行XTTSV2

该项目的灵感来自Silero-API-Server，并使用XTTSV2。

该服务器是为sillytavern创建的，但您可以将其用于您的需求

随意制作PR或为您自己的需求使用代码

如果您的计算机弱，则可以使用Google Collapon版本。

如果您正在寻找普通XTT的选项，请在这里https://github.com/daswer123/xtts-webui

最近，我几乎没有时间进行这个项目，所以我建议您熟悉一个类似的项目

ChangElog

您可以跟踪发布页面上的所有更改

托多

通过生成请求和其他端点可以更改生成参数

安装

简单安装：

pip install xtts-api-server

这将安装所有必要的依赖项，包括仅CPU支持版本的Pytorch版本

我建议您安装GPU版本以提高处理速度（最多快3倍）

视窗

python -m venv venv
venv S cripts a ctivate
pip install xtts-api-server
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

Linux

sudo apt install -y python3-dev python3-venv portaudio19-dev
python -m venv venv
source venv b in a ctivate
pip install xtts-api-server
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

手动的

 # Clone REPO
git clone https://github.com/daswer123/xtts-api-server
cd xtts-api-server
# Create virtual env
python -m venv venv
venv/scripts/activate or source venv/bin/activate
# Install deps
pip install -r requirements.txt
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
# Launch server
python -m xtts_api_server

将Docker Image与Docker组成

提供了一个Dockerfile来构建Docker映像，并提供了Docker-compose.yml文件以使用Docker组成的服务来运行服务器。

您可以使用以下命令构建图像：

mkdir xtts-api-server
cd xtts-api-server
docker run -d daswer123/xtts-api-server

或者

 cd docker
docker compose build

然后，您可以使用以下命令运行服务器：

docker compose up # or with -d to run in background

启动服务器

python -m xtts_api_server将在默认IP和端口上运行（Localhost：8020）

使用--deepspeed标志快速处理结果（2-3X加速度）

 usage: xtts_api_server [-h] [-hs HOST] [-p PORT] [-sf SPEAKER_FOLDER] [-o OUTPUT] [-t TUNNEL_URL] [-ms MODEL_SOURCE] [--listen] [--use-cache] [--lowvram] [--deepspeed] [--streaming-mode] [--stream-play-sync]

Run XTTSv2 within a FastAPI application

options:
  -h, --help show this help message and exit
  -hs HOST, --host HOST
  -p PORT, --port PORT
  -d DEVICE, --device DEVICE `cpu` or `cuda`, you can specify which video card to use, for example, `cuda:0`
  -sf SPEAKER_FOLDER, --speaker-folder The folder where you get the samples for tts
  -o OUTPUT, --output Output folder
  -mf MODELS_FOLDERS, --model-folder Folder where models for XTTS will be stored, finetuned models should be stored in this folder
  -t TUNNEL_URL, --tunnel URL of tunnel used (e.g: ngrok, localtunnel)
  -ms MODEL_SOURCE, --model-source ["api","apiManual","local"]
  -v MODEL_VERSION, --version You can download the official model or your own model, official version you can find [here](https://huggingface.co/coqui/XTTS-v2/tree/main)  the model version name is the same as the branch name [v2.0.2,v2.0.3, main] etc. Or you can load your model, just put model in models folder
  --listen Allows the server to be used outside the local computer, similar to -hs 0.0.0.0
  --use-cache Enables caching of results, your results will be saved and if there will be a repeated request, you will get a file instead of generation
  --lowvram The mode in which the model will be stored in RAM and when the processing will move to VRAM, the difference in speed is small
  --deepspeed allows you to speed up processing by several times, automatically downloads the necessary libraries
  --streaming-mode Enables streaming mode, currently has certain limitations, as described below.
  --streaming-mode-improve Enables streaming mode, includes an improved streaming mode that consumes 2gb more VRAM and uses a better tokenizer and more context.
  --stream-play-sync Additional flag for streaming mod that allows you to play all audio one at a time without interruption

您可以将文件的路径指定为文本，然后是路径计数和文件

您可以加载自己的模型，为此，您需要在模型中创建一个文件夹并使用配置加载模型，该文件夹中的注意应为3个文件config.json vocab.json model.pth

如果您希望主机聆听，请使用-HS 0.0.0.0或使用 - listen

需要-t或-tunnel标志，因此当您通过Get Geter Geters获得扬声器时，您可以获得正确的链接来聆听预览。更多信息在这里

模型源定义您要使用XTTS的格式：

local加载版本2.0.2默认情况下，但是您可以通过-v标志指定版本，模型保存到模型文件夹中，并使用XttsConfig和inference 。
apiManual默认情况下，加载2.0.2版本，但是您可以通过-V标志指定版本，模型保存到模型文件夹中，并使用tts_to_file函数从tts api中使用
api将加载最新版本的模型。 -v标志行不通。

可以在此处找到XTTSV2模型的所有版本，模型版本名称与分支名称[v2.0.2，v2.0.3，main]等相同。

您第一次运行或生成时，您可能需要确认您同意使用XTT。

关于流模式

流媒体模式使您可以获取音频并几乎立即播放。但是，它有许多局限性。

您可以在这里和这里看到此模式的工作方式

现在，关于限制

只能在本地计算机上使用
从您的电脑播放音频
不起作用端点tts_to_file仅tts_to_audio ，它返回沉默的1秒。

您可以使用-v标志指定XTTS模型的版本。

改进的流媒体模式适用于中文，日语，印地语等复杂语言，或者如果您希望语言引擎在处理语音时考虑更多信息。

--stream-play-sync标志 - 允许您按队列顺序播放所有消息，如果使用组聊天，则可以使用。在sillytavern中，您需要关闭流媒体以正确工作

API文档

可以从http：// localhost访问API文档：8020/DOCS

如何添加扬声器

默认情况下， speakers文件夹应出现在文件夹中，您需要将WAV文件放在语音样本中，您还可以创建一个文件夹并放置几个语音示例，这将提供更准确的结果

选择文件夹

您可以通过API更改扬声器的文件夹和用于输出的文件夹。

关于为高质量语音克隆创建样品的注意事项

以下帖子是Reddit的用户材料1276的报价

一些关于制作好样本的建议
让他们长约7-9秒。更长的时间不一定会更好。
确保将音频向下采样到单声道22050Hz 16位WAV文件。您将减慢一定百分比的速度，否则似乎会导致质量差的结果（基于一些测试）。无论如何，24000Hz是它输出的质量！
使用最新版本的Audacity，选择您的剪辑和轨道>重新采样至22050Hz，然后跟踪> Mix> MIX>立体声>立体声。然后文件>导出音频，将其保存为22050Hz的WAV
如果您需要进行任何音频清洁，请在将其压缩到上述设置之前（Mono，22050Hz，16位）。
确保您使用的剪辑不会有背景噪音或音乐上的音乐，例如，许多演员都在讲话时都有安静的音乐。质量不佳的音频将带有需要清理的嘶嘶声。即使我们不这样做，AI也会捡起它，并且在某种程度上会在某种程度上使用模拟声音，因此干净的音频是关键！
尝试使您的剪辑成为一个不错的流动语音之一，就像随附的示例文件一样。没有大的暂停，差距或其他声音。最好是您要复制的人会显示一些声音范围。示例文件在这里
确保剪辑不会以呼吸的声音开始或结束（呼吸进出）。
但是，使用AI生成的音频剪辑可能会引入不需要的声音，因为它已经对语音进行了复制/模拟，这将需要测试。