CDFSE_FastSpeech2
1.0.0
該倉庫包含伴隨論文“與內容相關的細粒揚聲器嵌入以零攝像器的揚聲器適應在文本到語音合成中”的代碼,該代碼基於Ming024/fastspeech2(非常感謝!)實現。
2022-06-15更新:這項工作已被接受為Interspeech 2022。
pip3 install -r requirements.txt有關更多詳細信息,請參考Ming024/fastspeech2。
例如,
python3 prepare_align.py config/AISHELL3/preprocess.yaml然後下載textgrid文件或使用MFA對齊語料庫,然後將TextGrid文件放入[PrepRocessed_data_path]中,例如PrepRocessed_data/aishell3/textgrid/。
最後,運行預處理腳本
python3 preprocess.py config/AISHELL3/preprocess.yaml此外:
訓練模型
python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml 指出:如果您發現PHNCLS損失似乎並沒有趨於趨勢或不明顯,請嘗試在文本/符號中手動調整符號命令。
(可選)使用張板
tensorboard --logdir output/log/AISHELL3批次
python3 synthesize.py --source synbatch_chinese.txt --restore_step 250000 --mode batch -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml 單一
# For Mandarin
python3 synthesize.py --text "清华大学人机语音交互实验室,聚焦人工智能场景下的智能语音交互技术研究。 " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml
# For English
python3 synthesize.py --text " Human Computer Speech Interaction Lab at Tsinghua University, targets artificial intelligence technologies for smart voice user interface. " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml @misc{zhou2022content,
title={Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis},
author={Zhou, Yixuan and Song, Changhe and Li, Xiang and Zhang, Luwen and Wu, Zhiyong and Bian, Yanyao and Su, Dan and Meng, Helen},
year={2022},
eprint={2204.00990},
archivePrefix={arXiv},
primaryClass={eess.AS}
}