推薦:
歡迎加入
Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit
這只是一個語音合成前端的Demo,沒有提供文本正則化,韻律預測功能,文字轉拼音使用pypinyin,分詞使用結巴分詞,這兩者的準確度也達不到商用水平。
其他語音合成項目傳送門,端到端是不錯的方向,自然度要優於merlin。
This is only a demo of mandarin frontend which is lack of some parts like "text normalization" and "prosody prediction", and the phone set && Question Set this project use havn't fully tested yet.
一個粗略的文檔:A draft documentation written in Mandarin
There is no open-source mandarin speech synthesis dataset on the internet, this proj used thchs30 dataset to demostrate speech synthesis
UPDATE
open-source mandarin speech synthesis data from data-banker company, 開源的中文語音合成數據,感謝標貝公司
【數據下載】https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar 【數據說明】http://www.data-baker.com/open_source.html
Listen to https://jackiexiao.github.io/MTTS/
Python : python3.6
System: linux(tested on ubuntu16.04)
pip install jieba pypinyin
sudo apt-get install libatlas3-base
Run bash tools/install_mtts.sh
Or download file by yourself
Run Demo
bash run_demo.sh
python src/mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add -a your_acoustic_model.zip , otherwise, this project use thchs30.zip acoustic model as defaulttxtfile example
A_01 这是一段文本
A_02 这是第二段文本
wav_directory example (Sampleing Rate should larger than 16khz)
A_01.wav
A_02.wav
python src/mandarin_frontend.py txtfile output_directory_path from mandarin_frontend import txt2label
result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞')
[print(line) for line in result]
# with prosody mark and alignment file (sfs file)
# result = txt2label('向#1香港#2特别#1行政区#1同胞#4澳门#2和#1台湾#1同胞#4海外#1侨胞',
sfsfile='example_file/example.sfs')
see source code for more information, but pay attention to the alignment file(sfs file), the format is endtime phone_type not start_time, phone_type (which is different from speech ocean's data)
This project use Montreal-Forced-Aligner to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see mfa: algin-using-only-the-dataset
misc/thchs30.zip , the dictionary we use mandarin_mtts.lexicon. If you use larger dataset than thchs30, you may get better alignment.You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)
"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.