bark voice cloning HuBERT quantizer下載 - bark voice cloning HuBERT quantizer源代碼下載

bark voice cloning HuBERT quantizer

其他源碼

1.0.0

下載

樹皮聲音克隆

請閱讀

該代碼可在Python 3.10上使用，我尚未在其他版本上對其進行測試。一些較舊的版本會遇到問題。

語音用高質量的樹皮克隆？

現在有可能。

examples_biden_example.mov

我如何克隆聲音？

對於開發人員：

擁抱面模型頁面上的代碼示例

為每個人：

帶有樹皮和語音克隆的音頻播放
在線擁抱面語音克隆空間
交互式Python筆記本

克隆的聲音不是很令人信服，為什麼別人的克隆聲音比我的聲音更好？

確保這些事情不在您的聲音輸入中：（沒有特定順序）

噪音（您之前可以使用降噪劑）
音樂（除非您想在後台音樂）
最後一個截止（這將導致它嘗試並繼續一代）
培訓數據的1秒鐘以下（我個人建議大約10秒鐘的潛力約10秒，但我的效果也很高，5秒也是如此。）

是什麼使及時的音頻變得好？（沒有特定順序）

顯然說
沒有奇怪的背景噪音
只有一位發言人
句子結束後結束的音頻
普通/普通的聲音（它們通常取得更大的成功，它仍然能夠克隆複雜的聲音，但擅長於此）
大約10秒的數據

預驗證的模型

官方的

姓名	休伯特模特	Quantizer版本	時代	語言	數據集
ventifier_hubert_base_ls960.pth	休伯特基地	0	3	工程	gitmylo/bark-semantic訓練
ventifier_hubert_base_ls960_14.pth	休伯特基地	0	14	工程	gitmylo/bark-semantic訓練
Quantifier_v1_hubert_base_ls960_23.pth	休伯特基地	1	23	工程	gitmylo/bark-semantic訓練

社區

作者	姓名	休伯特模特	Quantizer版本	時代	語言	數據集
Hobispl	波蘭 - 赫伯特-Quantizer_8_epoch.pth	休伯特基地	1	8	pol	Hobis/Bark-Polish-semantic-Wav-Training
c0untfloyd	德國 - 赫伯特 - Quantizer_14_epoch.pth	休伯特基地	1	14	Ger	Countfloyd/bark-german-emantic-wav-training

對於開發人員：在樹皮項目中實現語音克隆

只需將文件從此目錄複製到您的項目中。
Hubert Manager包含下載Hubert和自定義量化器模型的方法。
加載Customhubert應該很簡單
該筆記本包含在CUDA或CPU上使用的代碼。而不僅僅是CPU。

 from hubert . pre_kmeans_hubert import CustomHubert
import torchaudio

# Load the HuBERT model,
# checkpoint_path should work fine with data/models/hubert/hubert.pt for the default config
hubert_model = CustomHubert ( checkpoint_path = 'path/to/checkpoint' )

# Run the model to extract semantic features from an audio file, where wav is your audio file
wav , sr = torchaudio . load ( 'path/to/wav' ) # This is where you load your wav, with soundfile or torchaudio for example

if wav . shape [ 0 ] == 2 :  # Stereo to mono if needed
    wav = wav . mean ( 0 , keepdim = True )

semantic_vectors = hubert_model . forward ( wav , input_sample_hz = sr )

加載和運行自定義Kmeans

 import torch
from hubert . customtokenizer import CustomTokenizer

# Load the CustomTokenizer model from a checkpoint
# With default config, you can use the pretrained model from huggingface
# With the default setup from HuBERTManager, this will be in data/models/hubert/tokenizer.pth
tokenizer = CustomTokenizer . load_from_checkpoint ( 'data/models/hubert/tokenizer.pth' )  # Automatically uses the right layers

# Process the semantic vectors from the previous HuBERT run (This works in batches, so you can send the entire HuBERT output)
semantic_tokens = tokenizer . get_token ( semantic_vectors )

# Congratulations! You now have semantic tokens which can be used inside of a speaker prompt file.