e2 tts mlx
v0.0.6

通過MLX框架,實施E2-TT,令人尷尬的完全非自動迴旋零射擊TT。
E2 TTS是一種非自動回形的,零射擊的文本對語音系統,通過使用在掩蓋的音頻填充任務上訓練的流量匹配MEL Spectrogram生成器,簡化了典型的TTS管道,而無需幀級對齊信息。
該實現基於Pytorch中的Lucidrains實現,該實現與論文不同,因為它使用多式變壓器用於文本和音頻,並且進行條件完成了每個變壓器塊。
pip install mlx-e2-tts import mlx . core as mx
from e2_tts_mlx . model import E2TTS
from e2_tts_mlx . trainer import E2Trainer
from e2_tts_mlx . data import load_libritts_r
e2tts = E2TTS (
tokenizer = "char-utf8" , # or "phoneme_en" / callable
cond_drop_prob = 0.25 ,
frac_lengths_mask = ( 0.7 , 0.9 ),
transformer = dict (
dim = 1024 ,
depth = 24 ,
heads = 16 ,
text_depth = 12 ,
text_heads = 8 ,
max_seq_len = 4096 ,
dropout = 0.1
)
)
mx . eval ( e2tts . parameters ())
batch_size = 32
dataset = load_libritts_r ( split = "dev-clean" ) # or any audio/caption dataset
trainer = E2Trainer ( model = e2tts , num_warmup_steps = 20_000 )
trainer . train (
train_dataset = dataset ,
learning_rate = 7.5e-5 ,
batch_size = batch_size ,
total_steps = 1_000_000
)...經過大量培訓...
generated_audio = e2tts . sample (
cond = cond , # reference mel spectrogram for voice matching
text = text , # caption for generation
duration = duration , # from a trained DurationPredictor or otherwise
steps = 32 ,
cfg_strength = 1.0 , # if trained for cfg
use_vocos = True # set to False to get mel spectrograms instead of audio
)有關單個設備培訓的示例,請參見train_example.py 。
Lucidrains在Pytorch中的原始實現。
@inproceedings { Eskimez2024E2TE ,
title = { E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS } ,
author = { Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda } ,
year = { 2024 } ,
url = { https://api.semanticscholar.org/CorpusID:270738197 }
} @article { Burtsev2021MultiStreamT ,
title = { Multi-Stream Transformers } ,
author = { Mikhail S. Burtsev and Anna Rumshisky } ,
journal = { ArXiv } ,
year = { 2021 } ,
volume = { abs/2107.10342 } ,
url = { https://api.semanticscholar.org/CorpusID:236171087 }
}該存儲庫中的代碼按照許可證文件中的MIT許可發布。