Acoustic-FastSpeech2 (Custom)
Reasonable performance by utilizing transfer learning for personalization with small amounts of data
Provide APIs that can be created with Fine-Tune in Korean datasets in real time
Model code modification according to Fine-tune
Easy Preprocess, Train, Synthesis through Shell Script
Provide unique Docker Image
It matches the file name of FastSpeech2 and HiFi-Gan Pre-Trained CKPT and keeps it in each model.
(FastSpeech2: 30,000 STEP Learning / HiFi -GAN -Jungil King's Official Pretated -University -University)
For learning and synthesis, we load and execute Docker images that contain all dependent packages.
docker pull hws0120/e2e_speech_synthesis
Run_fs2_Preprocessing.SH steps are connected to Docker with a CONDA command and installs the Python package Jamo.
conda activate aligner
pip install jamo
End of the virtual environment to perform Run_fs2_train or synthesis.
conda activate base
If you meet all of the above items, run the Shell Script to extract the MFA.
sh run_FS2_preprocessing.sh
# Enter the dataset name
[Dataset_Name](ex. HW)
Successfully creates a Textgrid to exit the virtual environment and run the learning script.
sh run_FS2_train.sh
# Enter the dataset name
[Dataset_Name](ex. HW)
When FastSpeech2 5000 STEP learning is completed, run the hiFi-gan script.
sh run_HiFi-GAN_train.sh
# Enter the dataset name
[Dataset_Name](ex. HW)
When a model learned in the CKPT folder is ready, run a script for synthesis.
sh run_FS2_synthesize.sh
# Enter the dataset name
[Dataset_Name](ex. HW)
Each container is established in the learning and synthesis process and the process as shown.
If you have an appropriate HIFI-GAN checkpoint, you can omit hiFi-gan learning.