An adaptation of Reformer: The Efficient Transformer for text-to-speech task.
This project contains:
We aimed to create a significantly more efficient version of state-of-the-art text-to-speech model, by replacing its transformer architecture with optimizations proposed in the more recent reformer paper. We’ll use it to generate a believable deepfake of Donald Trump based on a custom dataset of his speeches, created specifically for this purpose.
Unfortunately, we weren't able to produce results matching the ones from Transformer TTS paper, after experimenting with more than 100 hyperparameter combinations over 2 months. We believe that the model size is a significant factor here, and to train transformers for TTS one really needs to reduce overfitting to allow long, steady training process (~1 week of training on RTX 2080Ti).
Also, having access to original implementation of Transformer TTS would greatly help.
While the reformer didn't match our expectations, the SqueezeWave implementation matches performance of the original one without FP16 support.
We also include CLI for running training and inference (see usage section), and all data necessary for reproduction of experiments (see development section).
The project is under a significant refactor, this version is left here to allow compatiblility with our previous expeirments and will be moved in the near future.
This project is a normal python package, and can be installed using pip,
as long as you have Python 3.8 or greater.
Go to releases page to find the installation instruction for latest release.
After installation, you can see available commands by running:
python -m reformer_tts.cli --helpAll commands are executed using cli, for example:
python -m reformer_tts.cli train-vocoderMost parameters (in particular, all training hyperparameters) are specified via
--config argument to cli (that goes before the command you want to run), eg:
python -m reformer_tts.cli -c /path/to/your/config.yml train-vocoderDefault values can be found in reformer_tts.config.Config (and its fields).
Thanks to conda-forge community, we can install all packages (including necessary
binaries, like ffmpeg) using one command.
conda env create -f environment.ymlPython>=3.8:which python
python --versionpip install -r requirements.txtEnsure you have ffmpeg>=3.4,<4.0 installed (installation instructions)
For training, ensure you have CUDA and GPU drivers installed (for details, see instructions on PyTorch website)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-credentials.jsonNOTE: if you only need read acces (for reproduction), you don't need to perform step 1
dvc pullTo do this you can run project tests:
python -m pytest --pyargs reformer_ttsAll tests should work on CPU and GPU, and may take up to a minute to complete.
Remember to pass --pyargs reformer_tts to pytest, otherwise it will search data directories for tests
Python>=3.8requirements.txt
as well as in environment.ymlreformer_tts/cli.py,
run python reformer_tts/cli.py --help for detailed referenceConfiguration is organized in dataclass structures:
config.py,
where the parameters and default values are defined - for example,
dataset config parameters are specified in reformer_tts.dataset.configreformer_tts.config.Config class contains all submodules' config settingsThis way, the default values are set close to the place where they are used, any config value can be overridden wherever you want
To change runtime configuration
python reformer_tts/cli.py save-config -o config/custom.yml
or manually copy one of the existing configuration files in config/ directory-c option, ie:
python reformer_tts/cli.py -c config/custom.yml [COMMAND]To add configuration for new module
config.py in your moduledataset and squeezewave modules)reformer_tts.config main config classWe use DVC for defining data processing pipelines.
Remote is set up on Google Cloud Storage, for details run dvc config list.
Nodes prepared for running:
/scidatalgBefore runing:
To run training:
srun --qos=gsn --partition=common --nodelist=<name_of_chosen_node> --pty /bin/bash/scidatalg/reformer-tts/reformer-tts/ make sure repository is pulled and on proper branchjobs/train_entropy.sbatch - fill node name and training commandsbatch your/job/script/location.sbatchPro Tip watch -n 1 squeue -u your_username to watch if your job is already running
Pro Tip2 You can watch the updates to the log by running tail -f file.log or less --follow-name +F file.log
To pull from dvc use jobs/entropy_dvc_pull.sbatch.
Since /scidatasm directory is not syncing while we want to train we have to setup training on each node separately by hand. To setup env on new node follow this instuctions:
Note: only nodes with /scidatalg are supported by this scripts. These nodes are: asusgpu4, asusgpu3, asusgpu2, asusgpu1, arnold, sylvester
srun --qos=gsn --partition=common --nodelist=<name_of_chosen_node> --pty /bin/bash${HOME}/gcp-cred.json (using your favourite editor)scripts/setup_entropy_node.sh to new file in home dir (again using editor)