fish speech.rs 다운로드 - fish speech.rs 소스 코드 다운로드

Fish-Speech.rs

Fish Speech.rs

Fish Speech 1.5 이하의 간단하고 빠른 텍스트 음성 연설 추론 서버는 순수한 녹로 작성되었습니다.

특징:

간단한 추론 : 스트리밍 오디오 및 WAV가있는 OpenAi 호환 서버
거꾸로 호환성 : 이제 여전히 물고기 연설 1.4 및 1.2 SFT를 지원하는 유일한 프로젝트
신뢰할 수있는 설치 : 단일 ~ 15MB 정적 바이너리, 파이썬 환경 없음 또는 torch.compile 캐시 실패로 컴파일

전제 조건

하드웨어 : Nvidia, Apple Silicon 및 CPU를 지원합니다.
OS : Linux 또는 MACOS를 적극 권장합니다. Windows 및 WSL은 아직 공식적으로 지원되지 않습니다. 설치가 훨씬 어렵고 같은 기계의 성능이 느려집니다.
시스템 : 소스에서 컴파일하기 위해 녹 설치 작업 (공식 문서 참조). 이 요구 사항을 최대한 빨리 제거하려고합니다.

소스에서 컴파일

메모

예, 소스에서 컴파일하는 것은 재미 있지 않습니다. 공식 Docker 이미지, 홈브류, Linux 포장 및 Python Interop 레이어가 로드맵에 있습니다.

당신이 당신의 플랫폼을 지원하기를 원한다면, 자유롭게 문제를 제기하십시오.

먼저이 리포지트를 원하는 폴더로 복제하십시오.

git clone https://github.com/EndlessReform/fish-speech.rs.git
cd fish-speech.rs

물고기 음성 검사 점을 ./checkpoints 로 저장하십시오. huggingface-cli 사용하는 것이 좋습니다.

 # If it's not already on system
brew install huggingface-cli
# For Windows or Linux (assumes working Python):
# pip install -U "huggingface_hub[cli]"
# or just skip this and download the folder manually

mkdir -p checkpoints/fish-speech-1.5
# NOTE: the official weights are not compatible
huggingface-cli download jkeisling/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5

오래된 생선 버전을 사용하는 경우 :

Fish 1.4의 경우 jkeisling/fish-speech-1.4 사용하십시오. 공식적인 무게는 호환되지 않습니다.
생선 1.2 SFT의 경우 공식 무게를 자유롭게 사용하십시오.

이제 소스에서 컴파일하십시오. Apple Silicon GPU 지원 :

cargo build --release --bin server --features metal

NVIDIA의 경우 :

cargo build --release --bin server --features cuda

NVIDIA에서 추가 성능을 발휘하려면 플래시주의를 활성화 할 수 있습니다. 처음으로 또는 주요 업데이트 후에 컴파일하고 있다면, 상당히 시간이 걸릴 수 있습니다. 적절한 CPU에서 최대 15 분, 16GB의 RAM. 당신은 경고를 받았습니다!

mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin server

이것은 바이너리를 ./target/release/server 로 컴파일합니다.

서버 실행

이진을 시작하십시오! 처음 시작하는 경우 실행하십시오.

 # Default voice to get you started
./target/release/server --voice-dir voices-template

옵션 :

--port : 기본값 3000
--checkpoint : Checkpoint 폴더의 디렉토리. 기본값으로 checkpoints/fish-speech-1.5 .
--voice-dir : 스피커 프롬프트를위한 디렉토리. (아래에 대한 자세한 내용)
--fish-version : 1.5 , 1.4 또는 1.2 . 기본값은 1.5입니다
--temp : 언어 모델 백본의 온도. 기본값 : 0.7
--top_p : 언어 모델 백본을위한 상위 P 샘플링. 기본 0.8, 끄려면 1로 설정하십시오.

이 서버는 OGG 오디오 (스트리밍) 및 WAV 오디오 출력을 지원합니다.

OpenAi 호환 클라이언트를 사용할 수 있습니다. Python 요청 예는 다음과 같습니다.

 from openai import OpenAI

client = OpenAI (
    base_url = "http://localhost:3000/v1"
)
audio = client . audio . speech . create (
    input = "Hello world!" ,
    voice = "default" ,
    response_format = "wav" ,
    model = "tts-1" ,
)

temp_file = "temp.wav"
audio . stream_to_file ( temp_file )

임시 음성 복제

음성을 복제하려면 WAV 파일과 전사가 필요합니다. 파일 fake.wav 에 "Hello World"라고 말하는 스피커 alice 추가하고 싶다고 가정 해 봅시다.

/v1/audio/encoding 엔드 포인트에 대한 게시물 요청을 다음과 같습니다.

파일 본문으로 가짜
id 및 prompt 는 URL에 인코딩 된 쿼리 매개 변수로 프롬프트합니다

컬의 예 :

curl -X POST " http://localhost:3000/v1/audio/encoding?id=alice&prompt=Hello%20world " 
  -F " [email protected] " 
  --output alice.npy

/v1/voices Debug Endpoint를 치면 음성이 추가되었는지 확인할 수 있습니다.

curl http://localhost:3000/v1/voices
# Returns ['default', 'alice']

입력 오디오가 인코딩 된 토큰으로 .npy 파일을 반환합니다. 그 후, 서버가 실행되는 한 alice 음성을 사용할 수 있으며 종료시 삭제됩니다.

이 음성을 저장하려면 .npy 파일을 사용하여 시작시 음성에 추가 할 수 있습니다. 아래를 참조하십시오.

지속적인 복제 된 목소리

메모

네, 이건 짜증납니다. 인코딩 된 음성을 디스크로 유지하는 것이 최우선 과제입니다.

voices-template/index.json 파일을 엽니 다. 그것은 다음과 같은 것처럼 보일 것입니다.

{
  "speakers" : {
    "default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
  }
}

Voices 디렉토리에는 다음이 포함됩니다.

이 색인 , 스피커 컨디셔닝 오디오에서 스피커 이름을 텍스트에 매핑
스피커 컨디셔닝 파일 , 파일 이름이 스피커의 이름 이고 오디오는 스피커 컨디셔닝 값입니다 (예 : default.npy )

런타임 인코딩에서 얻은 .npy 파일을 가져 와서 스피커 ID : EX로 이름을 바꿉니다. ID가 "Alice"인 경우 alice.npy 로 이름을 바꿉니다. 그런 다음 alice.npy Voices 폴더로 이동하십시오. index.json 에서 키를 추가하십시오.

{
  "speakers" : {
    "default" : " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. "
    "alice" : " I–I hardly know, sir, just at present–at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then. "
  }
}

서버를 다시 시작하면 새 음성이 좋을 것입니다.

CLI 스크립트

현재 우리는 공식 어류 연설 추론 CLI 스크립트와 호환성을 유지하고 있습니다. (추론 서버 및 파이썬 바인딩이 곧 출시됩니다!)

스피커 컨디셔닝 토큰을 생성합니다

 # saves to fake.npy by default
cargo run --release --features metal --bin encoder -- -i ./tests/resources/sky.wav

이전 버전의 경우 버전 및 체크 포인트를 수동으로 지정해야합니다.

cargo run --release --bin encoder -- --input ./tests/resources/sky.wav --output-path fake.npy --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

시맨틱 코드 북 토큰을 생성합니다

물고기 1.5 (기본값) :

 # Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin llama_generate -- 
  --text " That is not dead which can eternal lie, and with strange aeons even death may die. " 
  --prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. " 
  --prompt-tokens fake.npy

이전 버전의 경우 버전과 체크 포인트를 명시 적으로 지정해야합니다. 예를 들어, 물고기 1.2 :

cargo run --release --features metal --bin llama_generate -- --text " That is not dead which can eternal lie, and with strange aeons even death may die. " --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

추가 속도를 위해 플래시주의 지원으로 컴파일하십시오.

경고

CANDLE-FLASH-ANTENTION 의존성은 우수한 CPU에서도 컴파일하는 데 10 분 이상 걸릴 수 있으며 16GB 이상의 메모리가 필요할 수 있습니다! 당신은 경고를 받았습니다.

또한 2024 년 10 월 병목 현상은 실제로 다른 곳 (비효율적 인 메모리 카피 및 커널 디스패치)이므로 이미 빠른 하드웨어 (RTX 4090과 같은)에서는 현재 영향을 미치지 않습니다.

 # Cache the Flash Attention build
# Leave your computer, have a cup of tea, go touch grass, etc.
mkdir ~ /.candle
CANDLE_FLASH_ATTN_BUILD_DIR= $HOME /.candle cargo build --release --features flash-attn --bin llama_generate

# Then run with flash-attn flag
cargo run --release --features flash-attn --bin llama_generate -- 
  --text " That is not dead which can eternal lie, and with strange aeons even death may die. " 
  --prompt-text " When I heard the release demo, I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. " 
  --prompt-tokens fake.npy

토큰을 wav로 디코딩합니다

물고기 1.5 (기본값) :

 # Switch to --features cuda for Nvidia GPUs
cargo run --release --features metal --bin vocoder -- -i out.npy -o fake.wav

이전 모델의 경우 버전을 지정하십시오. 1.2 예 :

cargo run --release --bin vocoder -- --fish-version 1.2 --checkpoint ./checkpoints/fish-speech-1.2-sft

특허

경고

이 코드베이스는 원래 Apache 2.0 라이센스에 따라 라이센스가 부여됩니다. 원하는대로 자유롭게 사용하십시오. 그러나 물고기 음성 무게는 CC-by-NC-SA-4.0이며 비상업적 사용 만 가능합니다!

제작에 공식 API를 사용하여 원래 저자를 지원하십시오.

이 모델은 BY-CC-NC-SA-4.0 라이센스에 따라 허용됩니다. 소스 코드는 Apache 2.0 라이센스에 따라 해제됩니다.

대규모 감사합니다.

Codebase를 가로 지르는 매우 유용한 코드 스 니펫을 위해 모든 candle_examples 관리자
STFT 구현을위한 Waveyai의 Mel Spec

아래의 원래 readme

물고기 연설 v1.5

Fish Speech v1.5 는 여러 언어로 백만 시간 이상의 오디오 데이터를 훈련 한 TTS (Text Tepeech) 모델입니다.

지원되는 언어 :

영어 (en)> 300k 시간
중국어 (ZH)> 300K 시간
일본어 (JA)> 100k 시간
독일어 (DE) ~ 20k 시간
프랑스어 (FR) ~ 20k 시간
스페인어 (ES) ~ 20K 시간
한국 (KO) ~ 20K 시간
아랍어 (AR) ~ 20K 시간
러시아 (RU) ~ 20k 시간
네덜란드 (NL) <10k 시간
이탈리아 (IT) <10k 시간
폴란드 (PL) <10k 시간
포르투갈어 (PT) <10k 시간

자세한 내용은 Fish Speech Github를 참조하십시오. Fish Audio에서 사용할 수있는 데모.

소환

이 저장소가 유용하다고 생각되면이 작업을 인용하는 것을 고려하십시오.

 @misc{fish-speech-v1.4,
      title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},
      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
      year={2024},
      eprint={2411.01156},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.01156},
}