CDFSE_FastSpeech2下载CDFSE_FastSpeech2源代码下载

CDFSE_FastSpeech2

Ai源码

1.0.0

下载

cdfse_fastspeech2

该仓库包含伴随论文“与内容相关的细粒扬声器嵌入以零摄像器的扬声器适应在文本到语音合成中”的代码，该代码基于Ming024/fastspeech2（非常感谢！）实现。

2022-06-15更新：这项工作已被接受为Interspeech 2022。

样品|纸

用法

0。数据集

普通话：aishell3
英语：libritts

1。环境设置

pip3 install -r requirements.txt

2。数据预处理

有关更多详细信息，请参考Ming024/fastspeech2。

例如，

第一次运行

python3 prepare_align.py config/AISHELL3/preprocess.yaml

然后下载textgrid文件或使用MFA对齐语料库，然后将TextGrid文件放入[PrepRocessed_data_path]中，例如PrepRocessed_data/aishell3/textgrid/。
最后，运行预处理脚本

python3 preprocess.py config/AISHELL3/preprocess.yaml

此外：

我们已经在Preprocessed_data/[DataSet]/*中将火车，val和测试集拆分。因此，您可以将它们直接放入数据预处理后的[PrepRocessed_data_path]中，或者自己重新分组。
我们在preprocessed_data/[dataset]/*中提供了“ sakeerfile_dict.json”（在dataset.py中用于随机加载参考语音），您可以使用generate_speakerfiledict.py生成它。
我们在hifigan/preftraining/ *中提供了一些hifigan预处理的参数，您可以加载它们（请记住要解开 *.zip文件）或在utils/model.py中使用自己训练有素的vocoder。

3。培训

训练模型

python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml

指出：如果您发现PHNCLS损失似乎并没有趋于趋势或不明显，请尝试在文本/符号中手动调整符号命令。

（可选）使用张板

tensorboard --logdir output/log/AISHELL3

4。推理

批次

python3 synthesize.py --source synbatch_chinese.txt --restore_step 250000 --mode batch -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml

单一

 # For Mandarin
python3 synthesize.py --text "清华大学人机语音交互实验室，聚焦人工智能场景下的智能语音交互技术研究。 " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml 
# For English
python3 synthesize.py --text " Human Computer Speech Interaction Lab at Tsinghua University, targets artificial intelligence technologies for smart voice user interface. " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

实施更新

（2022-06-20）在MEL含量编码器中采用实例归一化，以提高性能。
（2022-06-01）支持英语设置：Libritts多扬声器数据集（Train-Clean-100 + Dev-Clean + Test-Clean）。
（2022-04-27）直接使用wavfile（*.wav）作为参考语音而不是单个模式下的numpy文件支持。

参考

ming024/fastspeech2
JIK876/HIFI-GAN

引用

 @misc{zhou2022content,
  title={Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis}, 
  author={Zhou, Yixuan and Song, Changhe and Li, Xiang and Zhang, Luwen and Wu, Zhiyong and Bian, Yanyao and Su, Dan and Meng, Helen},
  year={2022},
  eprint={2204.00990},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

展开

附加信息