Grad TTS Chinese下載 - Grad TTS Chinese源代碼下載

Grad TTS Chinese

Ai源碼

release grad-tts-cfm

下載

Huawei Grad-TTS for Chinese, integrated Bert and BigVGAN

用於學習的TTS算法項目，推理速度比較慢，但diffusion是大趨勢

grad_tts

bert_grad_tts Grad-TTS-CFM Framework

使用已訓練模型測試

從NVIDIA/BigVGAN 下載聲碼器模型bigvgan_base_24khz_100band
將g_05000000 放到./bigvgan_pretrain/g_05000000
從Executedone/Chinese-FastSpeech2 下載BERT韻律模型prosody_model
將best_model.pt 改名為prosody_model.pt，並放到./bert/prosody_model.pt
從Release頁面下載TTS模型grad_tts.pt from release page
將grad_tts.pt 放到當前目錄，或者任意地方
安裝環境依賴
pip install -r requirements.txt
cd ./grad/monotonic_align
python setup.py build_ext --inplace
cd -
推理測試
python inference.py --file test.txt --checkpoint grad_tts.pt --timesteps 10 --temperature 1.015
生成音頻在文件夾./inference_out
timesteps越大效果越好、推理時間越久；當被設置為0, 將跳過diffusion、輸出FrameEncoder生成的mel譜
temperature決定diffusion推理添加的噪聲量，需要調試出最佳值

標貝數據

下載標貝數據官方連接: https://www.data-baker.com/data/index/TNtts/
將Waves放到./data/Waves
將000001-010000.txt放到./data/000001-010000.txt
重採樣到24KHz，因為採用BigVGAN 24K模型
python tools/preprocess_a.py -w ./data/Wave/ -o ./data/wavs -s 24000
提取mel譜，替換聲碼器需注意，mel參數寫死在代碼中
python tools/preprocess_m.py --wav data/wavs/ --out data/mels/
提取BERT韻律向量，同時生成訓練索引文件train.txt和valid.txt
python tools/preprocess_b.py
輸出包括data/berts/和data/files
注意：打印信息，是在剔除儿化音（項目為算法演示，不做生產）

額外說明

原始標註為

 000001	卡尔普#2陪外孙#1玩滑梯#4。
	ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1
000002	假语村言#2别再#1拥抱我#4。
	jia2 yu3 cun1 yan2 bie2 zai4 yong1 bao4 wo3

需要標註為，BERT需要漢字卡尔普陪外孙玩滑梯。 (包括標點)，TTS需要聲韻母sil k a2 ^ er2 p u3 p ei2 ^ uai4 s uen1 ^ uan2 h ua2 t i1 sp sil

 000001	卡尔普陪外孙玩滑梯。
	ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1
	sil k a2 ^ er2 p u3 p ei2 ^ uai4 s uen1 ^ uan2 h ua2 t i1 sp sil
000002	假语村言别再拥抱我。
	jia2 yu3 cun1 yan2 bie2 zai4 yong1 bao4 wo3
	sil j ia2 ^ v3 c uen1 ^ ian2 b ie2 z ai4 ^ iong1 b ao4 ^ uo3 sp sil

訓練標註為

 ./data/wavs/000001.wav|./data/mels/000001.pt|./data/berts/000001.npy|sil k a2 ^ er2 p u3 p ei2 ^ uai4 s uen1 ^ uan2 h ua2 t i1 sp sil
./data/wavs/000002.wav|./data/mels/000002.pt|./data/berts/000002.npy|sil j ia2 ^ v3 c uen1 ^ ian2 b ie2 z ai4 ^ iong1 b ao4 ^ uo3 sp sil

遇到這句話會出錯

 002365	这图#2难不成#2是#1Ｐ过的#4？
	zhe4 tu2 nan2 bu4 cheng2 shi4 P IY1 guo4 de5

訓練

調試dataset
python tools/preprocess_d.py
啟動訓練
python train.py
恢復訓練
python train.py -p logs/new_exp/grad_tts_***.pt

推理

python inference.py --file test.txt --checkpoint ./logs/new_exp/grad_tts_***.pt --timesteps 20 --temperature 1.15

Loss

grad_tts_loss

本項目基於以下項目

https://github.com/huawei-noah/Speech-Backbones/blob/main/Grad-TTS

https://github.com/shivammehta25/Matcha-TTS

https://github.com/thuhcsi/LightGrad

https://github.com/Executedone/Chinese-FastSpeech2

https://github.com/PlayVoice/vits_chinese

https://github.com/NVIDIA/BigVGAN

Grad-TTS官方信息

Official implementation of the Grad-TTS model based on Diffusion Probabilistic Modelling. For all details check out our paper accepted to ICML 2021 via this link.

Authors : Vadim Popov*, Ivan Vovk*, Vladimir Gogoryan, Tasnima Sadekova, Mikhail Kudinov.

^{*Equal contribution.}

Abstract

Demo page with voiced abstract: link.

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score.

References

HiFi-GAN model is used as vocoder, official github repository: link.
Monotonic Alignment Search algorithm is used for unsupervised duration modelling, official github repository: link.
Phonemization utilizes CMUdict, official github repository: link.

BigVGAN 官方信息

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

project link: https://github.com/NVIDIA/BigVGAN

Infer Test

dowdload pretrain model bigvgan_base_24khz_100band

python bigvgan/inference.py 
--input_wavs_dir bigvgan_debug 
--output_dir bigvgan_out

Train with baker

python bigvgan/train.py --config bigvgan_pretrain/config.json

References

HiFi-GAN (for generator and multi-period discriminator)
Snake (for periodic activation)
Alias-free-torch (for anti-aliasing)
Julius (for low-pass filter)
UnivNet (for multi-resolution discriminator)

展開

附加信息

版本 release grad-tts-cfm
類型 Ai源碼
更新時間 2025-08-22
大小 639.22KB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
F5 TTS ComfyUI

2024-11-02
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
中文DOS遊戲(Chinese DOS games in browser)專案原始碼正式版

2022-11-01
語音開發英文資料(TTS使用指南Delphi版)

2009-05-28

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部