ETOS TTS, aims to build a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. It is a PyTorch Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.
sudo apt install libsndfile1you can use pip to install other requirements.
pip3 install -r requirements.txt
you can use pretrained model under models/may22 and run the tts web server:
python server.py -c server_conf.json
Then go to http://127.0.0.1:8000 and enjoy.
Currently TTS provides data loaders for
To run your own training, you need to define a config.json file (simple template below) and call with the command.
train.py --config_path config.json
If you like to use specific set of GPUs.
CUDA_VISIBLE_DEVICES="0,1,4" train.py --config_path config.json
Each run creates an experiment folder with the corresponfing date and time, under the folder you set in config.json. And if there is no checkpoint yet under that folder, it is going to be removed when you press Ctrl+C.
You can also enjoy Tensorboard with couple of good training logs, if you point --logdir the experiment folder.
Example config.json:
{
"num_mels": 80,
"num_freq": 1025,
"sample_rate": 22050,
"frame_length_ms": 50,
"frame_shift_ms": 12.5,
"preemphasis": 0.97,
"min_level_db": -100,
"ref_level_db": 20,
"embedding_size": 256,
"text_cleaner": "english_cleaners",
"epochs": 200,
"lr": 0.002,
"warmup_steps": 4000,
"batch_size": 32,
"eval_batch_size":32,
"r": 5,
"mk": 0.0, // guidede attention loss weight. if 0 no use
"priority_freq": true, // freq range emphasis
"griffin_lim_iters": 60,
"power": 1.2,
"dataset": "TWEB",
"meta_file_train": "transcript_train.txt",
"meta_file_val": "transcript_val.txt",
"data_path": "/data/shared/BibleSpeech/",
"min_seq_len": 0,
"num_loader_workers": 8,
"checkpoint": true, // if save checkpoint per save_step
"save_step": 200,
"output_path": "/path/to/my_experiment",
}