
一款簡單,快速的文本到語音的推理服務器,用於魚語音1.5及以下,以純鏽蝕書寫。
特徵:
torch.compile Cache Failures 筆記
是的,從源頭編譯並不有趣。官方的Docker圖像,Homebrew,Linux包裝和Python Interop層都在路線圖上。
如果您需要支持您的平台,請隨時提出一個問題,我會解決的!
首先,將此倉庫克隆到您想要的文件夾:
git clone https://github.com/EndlessReform/fish-speech.rs.git
cd fish-speech.rs將魚類語音檢查點保存到./checkpoints 。我建議使用huggingface-cli :
# If it's not already on system
brew install huggingface-cli
# For Windows or Linux (assumes working Python):
# pip install -U "huggingface_hub[cli]"
# or just skip this and download the folder manually
mkdir -p checkpoints/fish-speech-1.5
# NOTE: the official weights are not compatible
huggingface-cli download jkeisling/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5如果您正在使用較舊的魚類版本:
jkeisling/fish-speech-1.4 :官方體重不兼容。現在從源編譯。對於Apple Silicon GPU支持:
cargo build --release --bin server --features metal對於Nvidia:
cargo build --release --bin server --features cuda要在NVIDIA上進行額外的性能,您可以引起人們的注意。如果您是第一次或重大更新後進行編譯,則可能需要一段時間:在不錯的CPU上最多15分鐘和16GB的RAM。您已經被警告了!
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin server這會將您的二進製文件編譯到./target/release/server 。
只需開始二進制!如果您是第一次入門,請運行:
# Default voice to get you started
./target/release/server --voice-dir voices-template選項:
--port :默認為3000--checkpoint點文件夾的目錄。默認值為checkpoints/fish-speech-1.5 。--voice-dir :揚聲器提示目錄。 (以下有關此信息的更多信息)--fish-version : 1.5或1.2 1.4默認為1.5--temp :語言模型骨幹的溫度。默認值:0.7--top_p :語言模型主鏈的頂級Pampling。默認為0.8,要將其設置為1。該服務器支持OGG音頻(流)和WAV音頻輸出。
您可以使用任何與OpenAI兼容的客戶端。這是一個示例python請求:
from openai import OpenAI
client = OpenAI (
base_url = "http://localhost:3000/v1"
)
audio = client . audio . speech . create (
input = "Hello world!" ,
voice = "default" ,
response_format = "wav" ,
model = "tts-1" ,
)
temp_file = "temp.wav"
audio . stream_to_file ( temp_file )要克隆聲音,您需要一個WAV文件和轉錄。假設您想添加揚聲器alice ,他在fake.wav中說“ Hello World”。
向/v1/audio/encoding端點提出郵政請求:
id和prompt為URL編碼的查詢參數捲曲的示例:
curl -X POST " http://localhost:3000/v1/audio/encoding?id=alice&prompt=Hello%20world "
-F " [email protected] "
--output alice.npy您可以通過擊中/v1/voices調試端點來檢查是否添加了聲音:
curl http://localhost:3000/v1/voices
# Returns ['default', 'alice']這將返回帶有輸入音頻的.npy文件作為編碼令牌。此後,只要服務器運行, alice語音就可以使用,並且在關閉時將刪除。
如果要保存此語音,則可以使用返回的.npy文件將其添加到啟動時的聲音中:請參見下文。
筆記
是的,這很糟糕。持續到磁盤的編碼聲音是當務之急。
打開voices-template/index.json文件。它應該看起來像:
{
"speakers" : {
"default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
}
}聲音目錄包含:
default.npy )取您從運行時編碼中獲得的.npy文件並將其重命名為揚聲器ID:ex。如果ID為“ Alice”,請將其重命名為alice.npy 。然後將alice.npy移到您的Voices文件夾中。在index.json中,添加密鑰:
{
"speakers" : {
"default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
"alice" : " I–I hardly know, sir, just at present–at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then. "
}
}重新啟動服務器,您的新聲音應該很好。
目前,我們正在與官方的魚類語音推理CLI腳本保持兼容性。 (推理服務器和Python綁定即將到來!)
# saves to fake.npy by default
cargo run --release --features metal --bin encoder -- -i ./tests/resources/sky.wav對於早期版本,您需要手動指定版本和檢查點:
cargo run --release --bin encoder -- --input ./tests/resources/sky.wav --output-path fake.npy --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft對於魚1.5(默認):
# Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin llama_generate --
--text " That is not dead which can eternal lie, and with strange aeons even death may die. "
--prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
--prompt-tokens fake.npy對於早期版本,您必須明確指定版本和檢查點。例如,對於魚1.2:
cargo run --release --features metal --bin llama_generate -- --text " That is not dead which can eternal lie, and with strange aeons even death may die. " --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft為了額外的速度,請通過閃光注意力支持進行編譯。
警告
即使在良好的CPU上,蠟燭 - 閃爍的依賴性也可能需要超過10分鐘的時間來編譯,並且需要超過16 GB的內存!您已經被警告。
同樣,在2024年10月,瓶頸實際上是其他地方(以效率低下的內存副本和內核調度),因此在已經快速硬件(如RTX 4090)上,這當前的影響較小。
# Cache the Flash Attention build
# Leave your computer, have a cup of tea, go touch grass, etc.
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin llama_generate
# Then run with flash-attn flag
cargo run --release --features flash-attn --bin llama_generate --
--text " That is not dead which can eternal lie, and with strange aeons even death may die. "
--prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
--prompt-tokens fake.npy對於魚1.5(默認):
# Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin vocoder -- -i out.npy -o fake.wav對於早期型號,請指定版本。 1.2示例:
cargo run --release --bin vocoder -- --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft警告
該代碼庫是根據原始Apache 2.0許可證獲得許可的。隨意使用您想要的。但是,魚的語音重量是CC-BY-NC-SA-4.0,僅用於非商業用途!
請通過使用官方API生產來支持原始作者。
該模型允許根據BY-CC-NC-SA-4.0許可證獲得許可。源代碼在Apache 2.0許可下發布。
非常感謝:
candle_examples維護者在整個代碼庫中提供非常有用的代碼段Fish Speech V1.5是一種領先的文本到語音(TTS)模型,該模型在超過100萬小時的音頻數據中培訓了多種語言。
支持的語言:
有關更多信息,請參閱Fish Speech Github。演示可在Fish Audio上。
如果您發現此存儲庫有用,請考慮引用這項工作:
@misc{fish-speech-v1.4,
title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},
author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
year={2024},
eprint={2411.01156},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2411.01156},
}
該模型允許根據BY-CC-NC-SA-4.0許可證獲得許可。