GenerSpeech下載 - GenerSpeech源代碼下載

GenerSpeech

Ai源碼

1.0.0

下載

GenerSpeech：朝向風格轉移，以換取可推廣的外域文本到語音

Rongjie Huang，Yi Ren，Jinglin Liu，Chenye Cui，Zhou Zhao |海洋大學，海上AI實驗室

GenerSpeech的Pytorch實現（Neurips'22）：一種涉及高保真零拍的樣式轉移的文本到語音模型。

我們在此存儲庫中提供了實施和預處理的模型。

訪問我們的演示頁面以獲取音頻樣本。

消息

2022年12月： Generspeech（Neurips 2022）在Github發行。

關鍵功能

多層樣式轉移，用於表達文本到語音。
增強模型的概括為分佈（OOD）樣式參考。

快速開始

我們提供了一個示例，說明如何使用GenerSpeech生成高保真樣本。

要嘗試自己的數據集，只需將此存儲庫克隆到提供NVIDIA GPU + CUDA CUDNN的本地計算機中，然後按照以下說明進行操作即可。

支持數據集和預算模型

您可以在此處提供我們提供的驗證模型以及此處的數據。每個文件夾的詳細信息如下：

模型	數據集（16 kHz）	描述
Generspeech	Libritts，ESD	聲學模型（config）
hifi-gan	Libritts，ESD	神經聲碼器
編碼器	/	情感編碼器

更多支持的數據集即將推出。

依賴性

可以通過generspeech方式創建和激活一個合適的Conda環境：

 conda env create -f environment.yaml
conda activate generspeech

多GPU

默認情況下，此實現使用了同樣多的gpu並行的GPU，如torch.cuda.device_count()返回。您可以通過在運行訓練模塊之前設置CUDA_DEVICES_AVAILABLE環境變量來指定要使用的GPU。

推理（零射TTS）

在這裡，我們使用GenerSpeech提供語音合成管道。

準備GenerSpeech （聲學模型）：下載並將檢查站放在checkpoints/GenerSpeech
準備Hifi-gan （神經聲碼器）：下載並將檢查站放在checkpoints/trainset_hifigan
準備情感編碼器：下載並放置檢查點在checkpoints/Emotion_encoder.pt
準備數據集：下載並在data/binary/training_set放置統計文件
準備路徑/TO/reference_audio（16K） ：默認情況下，GenerSpeech使用ASR + MFA從參考獲得文本語音對齊。

CUDA_VISIBLE_DEVICES= $GPU python inference/GenerSpeech.py --config modules/GenerSpeech/config/generspeech.yaml  --exp_name GenerSpeech --hparams= " text='here we go',ref_audio='assets/0011_001570.wav' "

默認情況下將生成的WAV文件保存在infer_out中。

訓練自己的模型

數據準備和配置

將raw_data_dir設置， processed_data_dir ， binary_data_dir在配置文件中，然後將數據集下載到raw_data_dir 。
在配置文件中檢查preprocess_cls 。數據集結構需要遵循處理器preprocess_cls ，或者您可以根據數據集對其進行重寫。我們在modules/GenerSpeech/config/generspeech.yaml中提供庫麗特處理器作為示例
將全局情感編碼器下載到emotion_encoder_path 。有關更多詳細信息，請參閱此分支。
預處理數據集

 # Preprocess step: unify the file structure.
python data_gen/tts/bin/preprocess.py --config $path /to/config
# Align step: MFA alignment.
python data_gen/tts/bin/train_mfa_align.py --config $path /to/config
# Binarization step: Binarize data for fast IO.
CUDA_VISIBLE_DEVICES= $GPU python data_gen/tts/bin/binarize.py --config $path /to/config

您還可以通過NatsPeech構建數據集，該數據集共享一個常見的MFA數據處理過程。我們還提供處理後的數據集（16KHz libritts+ESD）。

培訓Generspeech

CUDA_VISIBLE_DEVICES= $GPU python tasks/run.py --config modules/GenerSpeech/config/generspeech.yaml  --exp_name GenerSpeech --reset

使用GenerSpeech的推理

CUDA_VISIBLE_DEVICES= $GPU python tasks/run.py --config modules/GenerSpeech/config/generspeech.yaml  --exp_name GenerSpeech --infer

致謝

該實現使用以下github存儲庫中的代碼部分：fastdiff，natspeech，如我們的代碼中所述。

引用

如果您發現此代碼對您的研究有用，請引用我們的工作：

 @inproceedings { huanggenerspeech ,
  title = { GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech } ,
  author = { Huang, Rongjie and Ren, Yi and Liu, Jinglin and Cui, Chenye and Zhao, Zhou } ,
  booktitle = { Advances in Neural Information Processing Systems }
}