FlashSpeech
1.0.0
Implementation of the FlashSpeech. For all details check out our paper accepted to ACM MM 2024: FlashSpeech: Efficient Zero-Shot Speech Synthesis.
bash env.shaccelerate with lightning because I encountered similar issues (related issue). Training with lightning is faster.ns2dataset.py based on your data.bash egs/tts/NaturalSpeech2/run_train.shImportant Notes:
Choose Configuration:
***_s1 or ***_s2 configuration files based on the training stage.Modify Model Codec:
models/tts/naturalspeech2/flashspeech.py, update the codec to your own.self.latent_norm to normalize the codec latent to the standard deviation. (This step is crucial for training the consistency model.)Stage 2 Setup:
models/tts/naturalspeech2/flashspeech_trainer_stage2.py, set the initial weights obtained from Stage 1 training.Stage 3 Development:
Further organize the project structure and complete the remaining code.
Special thanks to Amphion, as our codebase is primarily borrowed from Amphion.
Thank you for using FlashSpeech!