FastPitchFormant下载 - FastPitchFormant源代码下载

FastPitchFormant

Ai源码

v1.0.0

下载

fastpitchformant -Pytorch实施

pytorch的实现fastpitchformant：基于源过滤器的语音合成的分解建模。

Quickstart

依赖性

您可以使用

 pip3 install -r requirements.txt

推理

您必须下载验证的型号，并将它们放入output/ckpt/LJSpeech/ 。

对于英语单扬声器TTS，运行

 python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step 600000 --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

生成的话语将放入output/result/ 。

批次推理

也支持批次推理，尝试

 python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step 600000 --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

综合preprocessed_data/LJSpeech/val.txt中的所有话语

可控性

可以通过指定所需的音高/能量/持续时间比来控制综合话语的音调/口语速率。例如，人们可以将口语率提高20％，并使音调降低20％

 python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step 600000 --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml --duration_control 0.8 --pitch_control 0.8

训练

数据集

支持的数据集是

LJSpeech：单扬声器的英语数据集由13100个女演讲者的简短音频剪辑组成，其中7本非小说类书籍的阅读段落总共约24小时。

预处理

首先，运行

 python3 prepare_align.py config/LJSpeech/preprocess.yaml

用于一些准备工作。

如本文所述，蒙特利尔强制对准器（MFA）用于获得话语和音素序列之间的比对。这里提供了LJSpeech数据集的对齐。您必须在preprocessed_data/LJSpeech/TextGrid/中解压缩文件。

之后，通过

 python3 preprocess.py config/LJSpeech/preprocess.yaml

或者，您可以自己对齐语料库。下载官方MFA软件包并运行

 ./montreal-forced-aligner/bin/mfa_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt english preprocessed_data/LJSpeech

或者

 ./montreal-forced-aligner/bin/mfa_train_and_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt preprocessed_data/LJSpeech

对齐语料库，然后运行预处理脚本。

 python3 preprocess.py config/LJSpeech/preprocess.yaml

训练

培训您的模型

 python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

张板

使用

 tensorboard --logdir output/log/LJSpeech

在您的本地主机上提供张板。显示了损耗曲线，合成的MEL光谱图和音频。

实施问题

当前的实现和预训练模型使用归一化的音高值。在我的实验中，随着提出的音高偏移，音高可控性并非动态性。当您需要按照论文所述的更宽的音高范围时，您可以将normalization设置为False ./config/LJSpeech/preprocess.yaml
请注意，该论文训练了该模型高达1000K，而当前实施则提供了600K的预训练模型。
使用Hifi-gan代替vocgan进行录音。

引用

 @misc{lee2021fastpitchformant,
  author = {Lee, Keon},
  title = {FastPitchFormant},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {url{https://github.com/keonlee9420/FastPitchFormant}}
}