FastSpeech2 Pytorch Korean Multi Speaker Download - FastSpeech2 Pytorch Korean Multi Speaker Source code download

Download

FastSpeech2-PyTorch-Korean-Multi-Speaker

This project is implemented in Korean Multi-Speaker TTS by combining HiFi-Gan Vocoder with FastSpeech2.

This project aims to develop the TTS of the 'visible personalized AI speaker' project . It is replaced by the voices of the people around you want, rather than the voices of 'Siri', 'Bixby' and 'Ari'. (EX. Spouse, son, daughter, parents, etc.)
In order to cope with the immediate production of AI speakers, instead of excellent performance of Tacotron2 and WaveGlow, Non-AUTOREGRESSITIVE MOSTSPEECH2 and GAN-based Vocoder Model HIFI-GAN adopted both quality and production speed Consider it.
Based on the FastSpeech2 source code that corresponds to the Korean dataset KSS implemented in DLLAB.

Use of Acoustic-FastSpeech2, Vocoder-Hifigan model for rapid synthesis speed and high performance
Transfer learning use to personalize with small amounts of data ( ~~+ Zero-Shot Cloning~~ Side Project )
SPEAKER EMBEDDING implementation to learn Multi-Speaker for Pre-Train
Pipeline configuration so that the learning process is performed in the Korean dataset to the end-to-end

For pre-train, learn using AiHub's free conversation voice.
- On average, 1 hour and 30 minutes, learning with 30 men and 28 female data in consideration of quality
- Each speaker granted a unique number ID in the pretreatment process
For the Fine-Tune, refer to the KSS script, and record the voice of the new speaker in 100 sentences-300 sentence-600 sentences to evaluate performance.

The added content in the code used is as follows.

Speaker Embedding Implementation (Korean Multi-Speaker FastSpeech2)
- Add Embedding Layer to the model
- Encoder Output and Code Implementation (Embedding, Speaker Integrator)
- Get_speakers () function implementation that imports and stores Embedding information
Data_Preprocessing.py-End-to-end data pretreatment implementation containing all of the items below
Response to unstable synthesis of long sentences
- Set it to be attached after synthesizing in a special character unit (sentence unit)
Importing G2PK source code and applying only numbers and English
- Modify the package of the existing G2PK to only Korean language without PIP installation.

Save the WAV directory and json or transcript file in the DataSet/Data Name Directory as shown.
Learn Montral Forced Alinger in KALDI to learn textgrid by learning audio data.
```
 # lab 생성, mfa 학습, lab 분리
python data_preprocessing.py 
```
Save the generator learned by HiFi-Gan for evaluation during learning in the Vocoder/Pretained_models directory.

Write the data directly according to the format, or create a transcript by referring to the function of the data_preprocessing.py
Store the generated transcript and the directory of the data in DataSet and run the data_proprocessing.py
The MFA work is completed and the textgrid.zip file is created in the top directory
Preprocess.py performs and checks the pretrocessed folder.

Set up the path of hparam.py's batch size, hiFi-gan generator and start learning.
```
 python train.py
```
If you are studying, you can learn by adding RESTORE_STEP.
```
 python train.py --restore_step [step]
```

If you perform pre-train for multi-speaker, storage speaker_info.json automatically generated during pre-train learning
Put Speaker_info.json at the top of the directory
Run Python in the same way as performing a study in Train
```
 python train.py --restore_step [pre-train의 step]
```

This pipeline is a Flow pipeline for the TTS learning and creation that corresponds to the service.

Transfer_learning_pipeline

The containers are largely classified as four.
1. Database container containing the PATH and user information of the data
2. Transcript creation, file name simplification, TextGrid extraction with MFA, Data Preprocessing Container for the model
3. Learning Container for Pre-Training
4. Learning Container for Fine-Tuning on New Data
In the actual service situation, only three containers will work.

Expand

Additional Information

Related Applications

Recommended for You

Related Information All