Robust_Fine_Grained_Prosody_Control下载Robust_Fine_Grained_Prosody

Robust_Fine_Grained_Prosody_Control

Ai源码

1.0.0

下载

端到端语音综合的强大而细粒度的韵律控制（带有浪潮）

强大而细粒度的端到端语音综合控制（非官方）的强大和细粒度的韵律控制

此实现使用库列特数据集。

笔记

开发分支：带有多座的Tacotron2（扬声器嵌入）。说话者信息仅由解码器模块消耗，而注意模块没有看到任何内容（作为作者的意图）。
text_side分支：文本侧韵律控制模型实现。
语音侧韵律控制和韵律归一化不是在当前版本中实现的，但是您可以简单地将它们添加到上述分支的顶部。

先决条件

nvidia gpu + cuda cudnn

设置

下载并提取Libsitts数据集
克隆此仓库： git clone https://github.com/keonlee9420/Robust_Fine_Grained_Prosody_Control.git
CD进入此仓库： cd Robust_Fine_Grained_Prosody_Control
初始化subsodule： git submodule init; git submodule update
更新.wav路径： sed -i -- 's,/home/keon/speech-datasets/LibriTTS_preprocessed/train-clean-100/,your_libritts_dataset_folder/,g' filelists/*.txt
- 另外，设置load_mel_from_disk=True in hparams.py和更新mel-spectragram路径
安装Pytorch 1.0
安装顶点
安装Python要求或构建Docker Image
- 安装Python要求： pip install -r requirements.txt

训练

python train.py --output_directory=outdir --log_directory=logdir
（可选） tensorboard --logdir=outdir/logdir

使用预训练的模型培训

（TBD）

多GPU（分布式）和自动混合精度训练

当前实施不支持。

推理

单个样本： python inference.py -c checkpoint/path -r reference_audio/wav/path -t "synthesize text"
多样本： python inference_all.py -c checkpoint/path -r reference_audios/dir/path

NB执行MEL光谱图与音频合成时，请确保对Tacotron 2和MEL解释器进行相同的MEL光谱图表示。

引用

 @misc{lee2021robust_fine_grained_prosody_control,
  author = {Lee, Keon},
  title = {Robust_Fine_Grained_Prosody_Control},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {url{https://github.com/keonlee9420/Robust_Fine_Grained_Prosody_Control}}
}