
一款简单,快速的文本到语音的推理服务器,用于鱼语音1.5及以下,以纯锈蚀书写。
特征:
torch.compile Cache Failures 笔记
是的,从源头编译并不有趣。官方的Docker图像,Homebrew,Linux包装和Python Interop层都在路线图上。
如果您需要支持您的平台,请随时提出一个问题,我会解决的!
首先,将此仓库克隆到您想要的文件夹:
git clone https://github.com/EndlessReform/fish-speech.rs.git
cd fish-speech.rs将鱼类语音检查点保存到./checkpoints 。我建议使用huggingface-cli :
# If it's not already on system
brew install huggingface-cli
# For Windows or Linux (assumes working Python):
# pip install -U "huggingface_hub[cli]"
# or just skip this and download the folder manually
mkdir -p checkpoints/fish-speech-1.5
# NOTE: the official weights are not compatible
huggingface-cli download jkeisling/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5如果您正在使用较旧的鱼类版本:
jkeisling/fish-speech-1.4 :官方体重不兼容。现在从源编译。对于Apple Silicon GPU支持:
cargo build --release --bin server --features metal对于Nvidia:
cargo build --release --bin server --features cuda要在NVIDIA上进行额外的性能,您可以引起人们的注意。如果您是第一次或重大更新后进行编译,则可能需要一段时间:在不错的CPU上最多15分钟和16GB的RAM。您已经被警告了!
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin server这会将您的二进制文件编译到./target/release/server 。
只需开始二进制!如果您是第一次入门,请运行:
# Default voice to get you started
./target/release/server --voice-dir voices-template选项:
--port :默认为3000--checkpoint点文件夹的目录。默认值为checkpoints/fish-speech-1.5 。--voice-dir :扬声器提示目录。 (以下有关此信息的更多信息)--fish-version : 1.5或1.2 1.4默认为1.5--temp :语言模型骨干的温度。默认值:0.7--top_p :语言模型主链的顶级Pampling。默认为0.8,要将其设置为1。该服务器支持OGG音频(流)和WAV音频输出。
您可以使用任何与OpenAI兼容的客户端。这是一个示例python请求:
from openai import OpenAI
client = OpenAI (
base_url = "http://localhost:3000/v1"
)
audio = client . audio . speech . create (
input = "Hello world!" ,
voice = "default" ,
response_format = "wav" ,
model = "tts-1" ,
)
temp_file = "temp.wav"
audio . stream_to_file ( temp_file )要克隆声音,您需要一个WAV文件和转录。假设您想添加扬声器alice ,他在fake.wav中说“ Hello World”。
向/v1/audio/encoding端点提出邮政请求:
id和prompt为URL编码的查询参数卷曲的示例:
curl -X POST " http://localhost:3000/v1/audio/encoding?id=alice&prompt=Hello%20world "
-F " [email protected] "
--output alice.npy您可以通过击中/v1/voices调试端点来检查是否添加了声音:
curl http://localhost:3000/v1/voices
# Returns ['default', 'alice']这将返回带有输入音频的.npy文件作为编码令牌。此后,只要服务器运行, alice语音就可以使用,并且在关闭时将删除。
如果要保存此语音,则可以使用返回的.npy文件将其添加到启动时的声音中:请参见下文。
笔记
是的,这很糟糕。持续到磁盘的编码声音是当务之急。
打开voices-template/index.json文件。它应该看起来像:
{
"speakers" : {
"default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
}
}声音目录包含:
default.npy )取您从运行时编码中获得的.npy文件并将其重命名为扬声器ID:ex。如果ID为“ Alice”,请将其重命名为alice.npy 。然后将alice.npy移到您的Voices文件夹中。在index.json中,添加密钥:
{
"speakers" : {
"default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
"alice" : " I–I hardly know, sir, just at present–at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then. "
}
}重新启动服务器,您的新声音应该很好。
目前,我们正在与官方的鱼类语音推理CLI脚本保持兼容性。 (推理服务器和Python绑定即将到来!)
# saves to fake.npy by default
cargo run --release --features metal --bin encoder -- -i ./tests/resources/sky.wav对于早期版本,您需要手动指定版本和检查点:
cargo run --release --bin encoder -- --input ./tests/resources/sky.wav --output-path fake.npy --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft对于鱼1.5(默认):
# Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin llama_generate --
--text " That is not dead which can eternal lie, and with strange aeons even death may die. "
--prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
--prompt-tokens fake.npy对于早期版本,您必须明确指定版本和检查点。例如,对于鱼1.2:
cargo run --release --features metal --bin llama_generate -- --text " That is not dead which can eternal lie, and with strange aeons even death may die. " --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft为了额外的速度,请通过闪光注意力支持进行编译。
警告
即使在良好的CPU上,蜡烛 - 闪烁的依赖性也可能需要超过10分钟的时间来编译,并且需要超过16 GB的内存!您已经被警告。
同样,在2024年10月,瓶颈实际上是其他地方(以效率低下的内存副本和内核调度),因此在已经快速硬件(如RTX 4090)上,这当前的影响较小。
# Cache the Flash Attention build
# Leave your computer, have a cup of tea, go touch grass, etc.
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin llama_generate
# Then run with flash-attn flag
cargo run --release --features flash-attn --bin llama_generate --
--text " That is not dead which can eternal lie, and with strange aeons even death may die. "
--prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
--prompt-tokens fake.npy对于鱼1.5(默认):
# Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin vocoder -- -i out.npy -o fake.wav对于早期型号,请指定版本。 1.2示例:
cargo run --release --bin vocoder -- --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft警告
该代码库是根据原始Apache 2.0许可证获得许可的。随意使用您想要的。但是,鱼的语音重量是CC-BY-NC-SA-4.0,仅用于非商业用途!
请通过使用官方API生产来支持原始作者。
该模型允许根据BY-CC-NC-SA-4.0许可证获得许可。源代码在Apache 2.0许可下发布。
非常感谢:
candle_examples维护者在整个代码库中提供非常有用的代码段Fish Speech V1.5是一种领先的文本到语音(TTS)模型,该模型在超过100万小时的音频数据中培训了多种语言。
支持的语言:
有关更多信息,请参阅Fish Speech Github。演示可在Fish Audio上。
如果您发现此存储库有用,请考虑引用这项工作:
@misc{fish-speech-v1.4,
title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},
author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
year={2024},
eprint={2411.01156},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2411.01156},
}
该模型允许根据BY-CC-NC-SA-4.0许可证获得许可。