Automatic Dubbing with Voice Cloning and Speech Recognition
Made possible thanks to OpenVoice, MeloTTS, Faster Whisper, VoiceFixer, python-audio-separator and FFmpeg.
PRs are welcome, this is mostly just a proof-of-concept. Some good ideas for improvement include:
Install FFmpeg, FFprobe and FFplay on your system and make sure they are in PATH. You can download them from here.
Make a new directory and clone this repository:
git clone https://github.com/igerman00/Pollyduble
cd Pollydubleconda create -n dubbing python=3.9conda activate dubbinggit clone https://github.com/myshell-ai/OpenVoiceMake sure the OpenVoice repository is in the same directory as this repository, it should be named "OpenVoice".
cd OpenVoice
pip install -e .
pip install git+https://github.com/myshell-ai/MeloTTS.git
python -m unidic downloadtorch with GPU support (the index-url parameter should be optional for no GPU support):pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118cd .. # Go back to the root directory of the repo
pip install -r requirements-win-cu118.txtdemo.py script, and it is named video.mp4.python demo.py -i video.mp4 -s -mThe output will be stored in the Pollyduble/output directory by default. It will contain various files including the dubbed video, the separated audio, the dubbed audio, and the voice sample. Mostly, it should be one-click.
Options include:
-ior--inputto specify the input video file-oor--outputto specify the output directory (default isPollyduble/output)-vor--voiceto specify a custom sample for the voice cloning. If not specified, one will be created from the first 15 seconds of the video-sor--separateto enable audio separation, i.e. extracting the background music and the speech from the video separately-mor--muxto enable muxing the separated audio back into the video with the dubbed speech-for--fixto enable voice fixing, i.e. improving the quality of the dubbed speech.
^ Experimental and doesn't actually sound that good most of the time.--helpto display the help message
This project is licensed under the MIT License - see the LICENSE file for details.