Cantonese/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit
This project is influenced by MTTS
Python : python3.6
System: linux(tested on ubuntu16.04)
sudo apt-get install libatlas3-base
Run bash tools/install_mtts.sh
Or download file by yourself
Run Demo
bash run_demo.sh
python src/mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add-a your_acoustic_model.zip, otherwise, this project use thchs30.zip acoustic model as defaulttxtfile example
A_01 这是一段文本
A_02 这是第二段文本
wav_directory example(Sampleing Rate should larger than 16khz)
A_01.wav
A_02.wav
python src/mandarin_frontend.py txtfile output_directory_pathfrom mandarin_frontend import txt2label
result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞')
[print(line) for line in result]
see source code for more information, but pay attention to the alignment file(sfs file), the format is endtime phone_type not start_time, phone_type(which is different from speech ocean's data)
This project use Montreal-Forced-Aligner to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see mfa: algin-using-only-the-dataset
You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)