Clone a voice and output speech in another language with the original voice.
Python 3.7 is recommended. Python 3.7 is REQUIRED, due to the version of tensorflow being used in this project.
python3 -m venv pyvenv
Activate virtual environment:
Windows: ./pyvenv/Scripts/activate
MacOS/Linux: source pyvenv/bin/activate
Deactivating the virtual environment:
deactivate
Note: Your python virtual environment may cause issues when running the UI.
Once installed, extract the folder and add <ffmpeg folder path>/bin to path.
pip3 install -r requirements.txt
Once downloaded, add the models (*.pt) to CogNative/CogNative/models/RTVC/saved_models/default
The taco_pretrained folder (including the folder itself) needs to be downloaded and added to CogNative/CogNative/models/RTVCSwedish/synthesizer/saved_models/swedish
credentials.json in the top-level directory. There is currently a file named credentials.json.template, your credentials.json should match the key/value pairs shown there.Start from the CogNative root directory.
To launch GUI, run python -m CogNative.testUI.UI
Any necessary flags which are not specified will cause a prompt to be generated which must be answered before continuing. Examples follow.
python -m CogNative.main -helpCogNative CLI FLags:
-sampleAudio <PATH>: audio file of voice to clone
-synType <text, audio>: synthesis mode either given input text or by transcribing audio file
[-dialogueAudio] <PATH>: for audio synType, audio file of dialogue to speak
[-dialogueText] <PATH>: for text synType, text string of dialogue to speak
-out <PATH>: output audio file path
-useExistingEmbed <y/yes/n/no>: Uses saved embedding of previously used voice samples if enabled and present.
python -m CogNative.main -sampleAudio CogNative/examples/MatthewM66.wav -synType text -dialogueText "The turbo-encabulator has now reached a high level of development, and it's being successfully used in the operation of novertrunnions." -out cmdExampleText.wav -useExistingEmbed yLoaded encoder "english_encoder.pt" trained to step 1564501
Synthesizer using device: cuda
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at CogNativemodelsRTVCsaved_modelsdefaultvocoder.pt
Synthesizing...
Clone output to cmdExampleText.wav
python -m CogNative.main -sampleAudio CogNativeexamplesMatthewM66.wav -synType audio -dialogueAudio CogNativeexamplesBillMaher22.wav -out cmdExampleAudio.wav -useExistingEmbed nLoaded encoder "english_encoder.pt" trained to step 1564501
Synthesizer using device: cuda
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at CogNativemodelsRTVCsaved_modelsdefaultvocoder.pt
Loading requested file...
Synthesizing...
Clone output to cmdExampleAudio.wav
This script will translate audio from a supported language to English. To use the AutoTranslate script on Windows, drag and drop an audio file onto the script or place a SHORTCUT to the script in %AppData%MicrosoftWindowsSendTo and use the "Send To" context menu function on an audio file to be translated. In both cases a new .wav file with the orginal filename followed by "_ + destination language" will be placed in the same folder. For other platforms, the same CLI flags should be used but details on context menu integration will vary by what packages are installed.
git branch yourname-feature-nameThis style guide is important to make sure that all style matches throughout the project. To style your code, please use the Black Python styler.
Single file: black <python-file-name>
All files: black .
This github repository serves as the foundation of our voice cloning module.
Real-Time-Voice-Cloning
See license here.
This github repository trained the Swedish synthesizer.
Real-Time-Voice-Cloning Swedish