MB iSTFT VITS with AutoVocoder Download - MB iSTFT VITS with AutoVocoder Source code download

MB iSTFT VITS with AutoVocoder

AI Source Code

1.0.0

Download

MB-iSTFT-VITS with AutoVocoder

Motivation for implementation

Starting from VITS, MB-iSTFT-VITS improves the synthesis speed using below techniques:

Multi-band parallel generation strategy by decomposing speech signals into sub-band signals
iSTFT based waveform generation process

Based on this well-designed framework, this repository aims to further improve sound quality and inference speed with Autovocoder.
This repo is based on MB-iSTFT-VITS, and the expected modifications and enhancements are below:

1. Replace the iSTFTNet-based decoder to AutoVocoder-based decoder.
2. In iSTFT operation, use Real/Imaginary instead of Phase/Magnitude components to construct complex spectrogram. Add time-domain reconstruction loss.
3. Revise the posterior encoder to accept 4 complex components instead of linear spectrogram.

Owing to nature of VITS that models powerful latents, AutoVocoder can be proper application due to its autoencoder architecture. Also it has fast inference speed by directly generating waveform with (1024, 256, 1024) fft/hop/win size without upsmpling modules. (Multi-band startegy will be maintained)
Conventional TTS models including VITS, modeling phase information has been entirely the role of a decoder (vocoder). In Mod 3., by providing phase information to latents, we test whether prior can reliably approx these latents.

Disclaimer : This repo is built for testing purpose. Performance is not guaranteed. Welcome your contributions.

Note

For easy comparison, we did not change the whole architecture of the posterior encoder. Instead, we only used group convolution in the front part to process revised inputs (4 complex components).
In current, this repo tries to implement MB-iSTFT-VITS based model. Application to mini, MS, w/o MB might be future work.

Explanation (from MB-iSTFT-VITS)

0. Baseline: MB-iSTFT-VITS

1. Pre-requisites

Python >= 3.6
Clone this repository
Install python requirements. Please refer requirements.txt
1. You may need to install espeak first: apt-get install espeak
Download datasets
1. Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY1
Build Monotonic Alignment Search and run preprocessing if you use your own datasets.

# Cython-version Monotonoic Alignment Search
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace

2. Training

In the case of MB-iSTFT-VITS training, run the following script

python train_latest.py -c configs/ljs_mb_istft_vits.json -m ljs_mb_istft_vits

After the training, you can check inference audio using inference.ipynb

References

MB-iSTFT-VITS: Paper / Code
AutoVocoder: Paper / Code (unofficial)

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-14
size 5.02MB
From Github

Related Applications

MB Lab

2024-11-12
trapped with Jester

2024-02-23
RPG Maker WITH

2024-02-23
With My Past

2024-02-21
Text With Jesus

2023-08-17
Climb With Wheelbarrow

2022-08-26

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All