Deepaudio-tts is a framework for training neural network based Text-to-Speech (TTS) models. It inlcudes or will include popular neural network architectures for tts and vocoder models.
To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework. It is still in development.
$ export PYTHONPATH="${PYTHONPATH}:/dir/of/this/project/"
$ python -m deepaudio.tts.cli.train experiment=tacotron2 datamodule.train_metadata=/you/path/to/train_metadata datamodule.dev_metadata=/you/path/to/dev_metadata
It is a personal project. So I don't have enough gpu resources to do a lot of experiments. This project is still in development. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.
I borrowed a lot of codes from espnet and paddle speech