End to End TTS Fine Tune Download - End to End TTS Fine Tune Source code download

Download

End-to-end TTS FINE-TUNE

This project aims to develop the TTS system of ' AI assistant for seniors '. You can create an artificial intelligence assistant with the voice of family or friends and lovers with only 5 minutes of voice recording.
In order to cope with the AI speaker that requires real-time generation, we adopted the non-autoregressive acoustic model fastSpe2 and the GAN-based Vocoder Model HiFi-Gan to consider quality and production speed.
Additional customs have been carried out to improve the performance of Multi-Speaker.
This repository is configured to simply perform the learning and creation process by utilizing the Shell Scrip to configure the D-Vector Multi Speaker FastSpeeCH2 and the HIFI-GAN model to allow the Fine-Tune.
To provide real -time TTS in the app, use FastAPI to configure the server and link with Backend.

Acoustic-FastSpeech2 (Custom)
Reasonable performance by utilizing transfer learning for personalization with small amounts of data
Provide APIs that can be created with Fine-Tune in Korean datasets in real time

According to the Fine_tune_transcript.txt that belongs to the dataset folder, record 100 sentences with the number with a smartphone and convert the M4A file to a WAV file with a sampling rate of 16000. (FFMPEG)

As shown in the figure, 100 converted WAV files are added to your initial folder.

Model code modification according to Fine-tune
- FastSpeech2 and HiFi-GAN modification and integration
- DataSet, CKPT, and Results directory are divided into the top -level directory by dataset
Easy Preprocess, Train, Synthesis through Shell Script
- By changing the Dataset directory
Provide unique Docker Image
- Provide images that can be performed immediately without adding complex additional dependency packages
- Import Latest Image through Docker Hub Link

It matches the file name of FastSpeech2 and HiFi-Gan Pre-Trained CKPT and keeps it in each model.
(FastSpeech2: 30,000 STEP Learning / HiFi -GAN -Jungil King's Official Pretated -University -University)
For learning and synthesis, we load and execute Docker images that contain all dependent packages.
```
 docker pull hws0120/e2e_speech_synthesis 
```
Run_fs2_Preprocessing.SH steps are connected to Docker with a CONDA command and installs the Python package Jamo.
```
 conda activate aligner
pip install jamo
```
End of the virtual environment to perform Run_fs2_train or synthesis.
```
 conda activate base
```

If you meet all of the above items, run the Shell Script to extract the MFA.

 sh run_FS2_preprocessing.sh
# Enter the dataset name
[Dataset_Name](ex. HW)

When FastSpeech2 5000 STEP learning is completed, run the hiFi-gan script.

 sh run_HiFi-GAN_train.sh
# Enter the dataset name
[Dataset_Name](ex. HW)