BERT MB iSTFT VITS
1.0.0
16GB RAM.12GB of VRAM.Pytorch install command:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117CUDA 11.7 install:
https://developer.nvidia.com/cuda-11-7-0-download-archive
conda create -n vits python=3.8conda activate vitsgit clone https://github.com/project-elnino/BERT-MB-iSTFT-VITS.gitcd BERT-MB-iSTFT-VITSpip install -r requirements.txtpath/to/audio_001.wav |<speaker_name>|<language_code>|<text_001>
../kss2/1/1_0000.wav|KR-default|KR|그는 괜찮은 척하려고 애쓰는 것 같았다.
python preprocess.py --metadata ./metadata.list --config_path ./configs/config.jsonIf your speech file is either not Mono / PCM-16, the you should resample your .wav file first.
| Model | How to set up json file in configs | Sample of json file configuration |
|---|---|---|
| iSTFT-VITS | "istft_vits": true, "upsample_rates": [8,8], |
ljs_istft_vits.json |
| MB-iSTFT-VITS | "subbands": 4,"mb_istft_vits": true, "upsample_rates": [4,4], |
ljs_mb_istft_vits.json |
| MS-iSTFT-VITS | "subbands": 4,"ms_istft_vits": true, "upsample_rates": [4,4], |
ljs_ms_istft_vits.json |
training_files and validation_files to the path of preprocessed manifest files.python train.py -c <config> -m <folder>Resume training from lastest checkpoint is automatic.
Check inference.py
python inference.py -m ./models/kss/G_64000.pthServer Inference
python inference_server.py -m ./models/kss/G_64000.pthDo Inference
curl -X POST -H "Content-Type: application/json" -d '{"text": "잠시 통화 괜찮으시면 전화를 끊지 말아주세요."}' http://localhost:5000/synthesize