fish speech.rs下載 - fish speech.rs源代碼下載

魚 - 斯波克

一款簡單，快速的文本到語音的推理服務器，用於魚語音1.5及以下，以純鏽蝕書寫。

特徵：

簡單推斷：帶有流音頻和WAV的OpenAI兼容服務器
向後兼容：現在唯一仍支持魚類演講1.4和1.2 SFT的項目
可靠的安裝：編譯到單個〜15MB靜態二進制，無PYTHON環境或torch.compile Cache Failures

先決條件

硬件：支持NVIDIA，Apple Silicon和CPU。
OS：強烈建議使用Linux或MacOS。 Windows和WSL起作用，但尚未正式支持：安裝要困難得多，並且在同一台機器上的性能較慢。
系統：工作生鏽安裝以從源頭編譯（請參閱官方文檔）。我將盡快刪除此要求。

從源編譯

筆記

是的，從源頭編譯並不有趣。官方的Docker圖像，Homebrew，Linux包裝和Python Interop層都在路線圖上。

如果您需要支持您的平台，請隨時提出一個問題，我會解決的！

首先，將此倉庫克隆到您想要的文件夾：

git clone https://github.com/EndlessReform/fish-speech.rs.git
cd fish-speech.rs

將魚類語音檢查點保存到./checkpoints 。我建議使用huggingface-cli ：

 # If it's not already on system
brew install huggingface-cli
# For Windows or Linux (assumes working Python):
# pip install -U "huggingface_hub[cli]"
# or just skip this and download the folder manually

mkdir -p checkpoints/fish-speech-1.5
# NOTE: the official weights are not compatible
huggingface-cli download jkeisling/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5

如果您正在使用較舊的魚類版本：

對於魚1.4，請使用jkeisling/fish-speech-1.4 ：官方體重不兼容。
對於魚1.2平方英尺，請隨時使用官方體重。

現在從源編譯。對於Apple Silicon GPU支持：

cargo build --release --bin server --features metal

對於Nvidia：

cargo build --release --bin server --features cuda

要在NVIDIA上進行額外的性能，您可以引起人們的注意。如果您是第一次或重大更新後進行編譯，則可能需要一段時間：在不錯的CPU上最多15分鐘和16GB的RAM。您已經被警告了！

mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin server

這會將您的二進製文件編譯到./target/release/server 。

運行服務器

只需開始二進制！如果您是第一次入門，請運行：

 # Default voice to get you started
./target/release/server --voice-dir voices-template

選項：

--port ：默認為3000
--checkpoint點文件夾的目錄。默認值為checkpoints/fish-speech-1.5 。
--voice-dir ：揚聲器提示目錄。（以下有關此信息的更多信息）
--fish-version ： 1.5或1.2 1.4默認為1.5
--temp ：語言模型骨幹的溫度。默認值：0.7
--top_p ：語言模型主鏈的頂級Pampling。默認為0.8，要將其設置為1。

該服務器支持OGG音頻（流）和WAV音頻輸出。

您可以使用任何與OpenAI兼容的客戶端。這是一個示例python請求：

 from openai import OpenAI

client = OpenAI (
    base_url = "http://localhost:3000/v1"
)
audio = client . audio . speech . create (
    input = "Hello world!" ,
    voice = "default" ,
    response_format = "wav" ,
    model = "tts-1" ,
)

temp_file = "temp.wav"
audio . stream_to_file ( temp_file )

暫時的語音克隆

要克隆聲音，您需要一個WAV文件和轉錄。假設您想添加揚聲器alice ，他在fake.wav中說“ Hello World”。

向/v1/audio/encoding端點提出郵政請求：

偽造作為文件主體
id和prompt為URL編碼的查詢參數

捲曲的示例：

curl -X POST " http://localhost:3000/v1/audio/encoding?id=alice&prompt=Hello%20world " 
  -F " [email protected] " 
  --output alice.npy

您可以通過擊中/v1/voices調試端點來檢查是否添加了聲音：

curl http://localhost:3000/v1/voices
# Returns ['default', 'alice']

這將返回帶有輸入音頻的.npy文件作為編碼令牌。此後，只要服務器運行， alice語音就可以使用，並且在關閉時將刪除。

如果要保存此語音，則可以使用返回的.npy文件將其添加到啟動時的聲音中：請參見下文。

堅持克隆的聲音

筆記

是的，這很糟糕。持續到磁盤的編碼聲音是當務之急。

打開voices-template/index.json文件。它應該看起來像：

{
  "speakers" : {
    "default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
  }
}

聲音目錄包含：

此索引，將揚聲器映射到其揚聲器條件音頻中的文本
揚聲器條件文件，其中文件名是揚聲器的名稱，音頻是揚聲器條件值（ default.npy ）

取您從運行時編碼中獲得的.npy文件並將其重命名為揚聲器ID：ex。如果ID為“ Alice”，請將其重命名為alice.npy 。然後將alice.npy移到您的Voices文件夾中。在index.json中，添加密鑰：

{
  "speakers" : {
    "default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
    "alice" : " I–I hardly know, sir, just at present–at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then. "
  }
}

重新啟動服務器，您的新聲音應該很好。

CLI腳本

目前，我們正在與官方的魚類語音推理CLI腳本保持兼容性。（推理服務器和Python綁定即將到來！）

生成揚聲器條件令牌

 # saves to fake.npy by default
cargo run --release --features metal --bin encoder -- -i ./tests/resources/sky.wav

對於早期版本，您需要手動指定版本和檢查點：

cargo run --release --bin encoder -- --input ./tests/resources/sky.wav --output-path fake.npy --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

生成語義代碼書令牌

對於魚1.5（默認）：

 # Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin llama_generate -- 
  --text " That is not dead which can eternal lie, and with strange aeons even death may die. " 
  --prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. " 
  --prompt-tokens fake.npy

對於早期版本，您必須明確指定版本和檢查點。例如，對於魚1.2：

cargo run --release --features metal --bin llama_generate -- --text " That is not dead which can eternal lie, and with strange aeons even death may die. " --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

為了額外的速度，請通過閃光注意力支持進行編譯。

警告

即使在良好的CPU上，蠟燭 - 閃爍的依賴性也可能需要超過10分鐘的時間來編譯，並且需要超過16 GB的內存！您已經被警告。

同樣，在2024年10月，瓶頸實際上是其他地方（以效率低下的內存副本和內核調度），因此在已經快速硬件（如RTX 4090）上，這當前的影響較小。

 # Cache the Flash Attention build
# Leave your computer, have a cup of tea, go touch grass, etc.
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin llama_generate

# Then run with flash-attn flag
cargo run --release --features flash-attn --bin llama_generate -- 
  --text " That is not dead which can eternal lie, and with strange aeons even death may die. " 
  --prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. " 
  --prompt-tokens fake.npy

將令牌解碼為wav

對於魚1.5（默認）：

 # Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin vocoder -- -i out.npy -o fake.wav

對於早期型號，請指定版本。 1.2示例：

cargo run --release --bin vocoder -- --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

執照

警告

該代碼庫是根據原始Apache 2.0許可證獲得許可的。隨意使用您想要的。但是，魚的語音重量是CC-BY-NC-SA-4.0，僅用於非商業用途！

請通過使用官方API生產來支持原始作者。

該模型允許根據BY-CC-NC-SA-4.0許可證獲得許可。源代碼在Apache 2.0許可下發布。

非常感謝：

所有candle_examples維護者在整個代碼庫中提供非常有用的代碼段
Waveyai的MEL Spec用於STFT實施

下面的原始讀數

魚言語v1.5

Fish Speech V1.5是一種領先的文本到語音（TTS）模型，該模型在超過100萬小時的音頻數據中培訓了多種語言。

支持的語言：

英語（en）> 30萬小時
中文（ZH）> 30萬小時
日語（JA）> 100k小時
德語（de）〜20k小時
法語（fr）〜20k小時
西班牙（ES）〜20K小時
韓語（ko）〜2萬小時
阿拉伯語（AR）〜20K小時
俄羅斯（ru）〜2萬小時
荷蘭（NL）<10k小時
意大利語（IT）<10k小時
拋光（PL）<10k小時
葡萄牙（PT）<10k小時

有關更多信息，請參閱Fish Speech Github。演示可在Fish Audio上。

引用

如果您發現此存儲庫有用，請考慮引用這項工作：

 @misc{fish-speech-v1.4,
      title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},
      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
      year={2024},
      eprint={2411.01156},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.01156},
}