whisper vits japanese Download - whisper vits japanese Source code download

whisper vits japanese

AI Source Code

1.0.0

Download

Whisper-VITS-Japanese

Tutorial is here: https://www.bilibili.com/video/BV19e4y167Dx/?spm_id_from=333.999.0.0

Task	Notebook
Whisper_Vits_Japanese (built-in Ella dataset)

2023.01.31 Updated the new spec.pt automatically generates a new spec.pt after it is damaged to enhance the fault tolerance rate of recovery training from the network disk

This project uses Google's Whisper project as the VITS data processor. By modifying the Whisper project's transcribe.py, it generates the corresponding Srt file for the audio (the deleted PR is used here, and the PR is no longer found, so it cannot be referenced to the original author). At the same time, the limit of Whisper can only read a few audio files is relaxed to the point where it can traverse all audio files in the folder. Whisper can output Srt to make input of long audio possible, and users no longer need to cut the audio into pieces, or even transliterate the text of long audio. We directly rely on Whisper for speech recognition and data preparation, automatically slice into short audio, automatically generate transcript files, and then send them to the VITS training process. Considering that long-term audio dry sound is easier to obtain, VITS entry barriers are greatly reduced again.

The processing process is roughly as follows: The Srt file recognized by Whisper will be processed by auto.py. The processing process refers to tobiasrordorf/SRT-to-CSV-and-audio-split: Split long audio files based on subtitle-info in SRT File (Transcript saved in CSV) (github.com). The audio file is first converted to 22050Hz and 16 bits, and then the timestamps of the Srt file with the same name and the speech recognition transcript are converted to a csv file. The Csv file has the start time and end time of each segment of the audio, as well as the corresponding transcript and audio file paths. Then, the AudioSegment package is used to split long audio according to the start time and end time, and audio files with suffixes are generated in the order of slices, such as A_0.wav and A_1.wav, etc. All the sliced audio will be stored in the slice_audio folder, and then the txt file with "path|translation" required by VITS will be generated under the filelists folder. The subsequent data flow can be directly connected to the VITS part.

The VITS cleaner and symbol I use now is CjangCjengh/vits: VITS implementation of Japanese, Chinese, Korean and Sanskrit (github.com) as the initial version of the Creation God period. Now his warehouse has updated more cleaners and symbols, but I am a very nostalgic person, and I miss the time when everyone came to VITS at the beginning, so I still use the original version. VITS has two main preprocessing, one is monotonic align and the other is preprocess.py, and then you can start train.py. I put all the processes into whisper-vits-japanese.ipynb, and I just need to click them step by step to run. The only thing that needs the user to change is to replace my audio zip path with your own audio zip, and the rest of the parts do not need to be modified. Finally, I added the instructions to save the model and processed files to the network disk, and to restore the last latest checkpoint from the network disk during the next training.

The following multiplayer training part is completed by Mr47121836, and we express our gratitude

In addition, special acknowledgements include port problems, Numpy version and text preprocessing issues pointed out by the loss of trace.

2023.02.02 added auto_ms.py,ms.json file. For multi-player training, you have to run auto_ms.py

Pre-processing:

Just name the audio file format speakerId_XXXX.wav and upload it to the audio folder. Then follow the general steps to run it. When the audio processing is done, run the auto_ms.py file, and the txt file will be automatically generated, with the format Path|speakerId|text.

Note: If you use auto_ms.py to generate txt, you must modify it to the code in the Alignment and Text Conversion step: (because the text_index is not 1 but 2 when training for multiple people)

 python preprocess.py --text_index 2 --text_cleaners japanese_cleaners --filelists /content/whisper-vits-japanese/filelists/train_filelist.txt /content/whisper-vits-japanese/filelists/val_filelist.txt

train:

 python train_ms.py -c configs/ms.json -m ms

The multiplayer model interface part uses:

 hps = utils.get_hparams_from_file("./configs/ms.json")

net_g = SynthesizerTrn(
    len(symbols),  
    hps.data.filter_length // 2 + 1,  
    hps.train.segment_size // hps.data.hop_length,  
    n_speakers=hps.data.n_speakers,  
    **hps.model).cuda()  
_ = net_g.eval()  

_ = utils.load_checkpoint("/root/autodl-tmp/logs/ms/G_29000.pth", net_g, None)

stn_tst = get_text("ごめんね優衣", hps)
with torch.no_grad():  
    x_tst = stn_tst.cuda().unsqueeze(0)  
    x_tst_lengths = torch.LongTensor([stn_tst.size(0)]).cuda()  
    sid = torch.LongTensor([11]).cuda() //11指speakerId为11，如果有12个n_speaker,编号就从0-11  
    audio = net_g.infer(x_tst, x_tst_lengths, sid=sid, noise_scale=.667, noise_scale_w=0.8, length_scale=1)[0][0,0].data.cpu().float().numpy()  
ipd.display(ipd.Audio(audio, rate=hps.data.sampling_rate, normalize=False))

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-08-21
size 488.76KB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
JOKE

2024-02-26

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All