Chinese-FastSpeech2
Based on the standard female voice data of Biaobei Chinese, the FastSpeech2 model of the original paper was improved, and the rhythmic representation and rhythm prediction module were introduced to make the Chinese pronunciation more vivid and rhythmic
20230402 Update
- 1. Add the rhythm model training code, in the BertProsody directory
- 2. Add the preprocessing code for the rhythm model training (for standard shell data, the code has not been sorted out, first release), in preprocessor/biaobei.py
Sample
Refer to the audio generated in samples
Model File
The main structure of this project is FastSpeech2+HifiGAN structure. In addition, the rhythm vector of Chinese text is introduced in the input stage. Therefore, there are three models: fastspeech_model, hifigan_model, prosody_model (net disk link, extraction code: qgpi). After downloading, put the model file into the specified directory:
- 8000.pth.tar ---> output/ckpt/biaobei/
- generator_universal.pth.tar ---> hifigan/
- best_model.pt ---> transformer/prosody_model/
predict
Two prediction methods are provided: 1) python synthesize_all.py; 2) http interface call
- The first method is interactive . After running python synthesize_all.py on the command line, enter the text that needs to be converted. After running, the tmp.wav file will be generated in the code in the current working directory;
- The second method is to call the API , run tts_server.py, which will start the voice-to-text interface. If you call this interface, you can refer to TestServer.py. The same generated audio file (tmp.wav) will be saved in the current working directory.
train
- Since this project refers to the FastSpeech2 project, if you want to customize training, the project provides a more detailed training method for reference;
- This project has made some optimizations to the original method. For the optimization part, please refer to the blog: Chinese speech synthesis based on FastSpeech2 optimization
This project is an attempt to make speech synthesis out of personal interests. Everyone is welcome to criticize and correct me and communicate more!