e2 tts mlx ดาวน์โหลด - e2 tts mlx ซอร์สโค้ดดาวน์โหลดดาวน์โหลด

e2 tts mlx

โค้ดแหล่งที่มา AI

v0.0.6

ดาวน์โหลด

E2 TTS Diagram

E2 TTS - MLX

การใช้งานของ E2-TTS ง่ายอย่างไม่น่าเชื่ออย่างเต็มที่ไม่ได้เป็นศูนย์ TTS ที่ไม่ได้เป็นศูนย์ด้วยเฟรมเวิร์ก MLX

E2 TTS เป็นระบบที่ไม่ได้เป็นแบบไม่มีการพูดแบบไม่เป็นศูนย์ซึ่งทำให้ท่อส่งสัญญาณ TTS ทั่วไปง่ายขึ้นโดยใช้เครื่องกำเนิดไฟฟ้า MEL ที่จับคู่กับการไหลเวียน

การใช้งานนี้ขึ้นอยู่กับการใช้งาน lucidrains ใน Pytorch ซึ่งแตกต่างจากกระดาษในการที่ใช้หม้อแปลงหลายมิติสำหรับข้อความและเสียงโดยมีการปรับอากาศทุกบล็อกหม้อแปลง

การติดตั้ง

pip install mlx-e2-tts

การใช้งาน

 import mlx . core as mx

from e2_tts_mlx . model import E2TTS
from e2_tts_mlx . trainer import E2Trainer
from e2_tts_mlx . data import load_libritts_r

e2tts = E2TTS (
    tokenizer = "char-utf8" ,  # or "phoneme_en" / callable
    cond_drop_prob = 0.25 ,
    frac_lengths_mask = ( 0.7 , 0.9 ),
    transformer = dict (
        dim = 1024 ,
        depth = 24 ,
        heads = 16 ,
        text_depth = 12 ,
        text_heads = 8 ,
        max_seq_len = 4096 ,
        dropout = 0.1
    )
)
mx . eval ( e2tts . parameters ())

batch_size = 32

dataset = load_libritts_r ( split = "dev-clean" )  # or any audio/caption dataset

trainer = E2Trainer ( model = e2tts , num_warmup_steps = 20_000 )

trainer . train (
    train_dataset = dataset ,
    learning_rate = 7.5e-5 ,
    batch_size = batch_size ,
    total_steps = 1_000_000
)

... หลังจากฝึกอบรมมาก ...

 generated_audio = e2tts . sample (
    cond = cond ,  # reference mel spectrogram for voice matching
    text = text ,  # caption for generation
    duration = duration , # from a trained DurationPredictor or otherwise
    steps = 32 ,
    cfg_strength = 1.0 ,  # if trained for cfg
    use_vocos = True  # set to False to get mel spectrograms instead of audio
)

ดู train_example.py สำหรับตัวอย่างของการฝึกอบรมอุปกรณ์เดียว

ความกตัญญู

Lucidrains สำหรับการใช้งานดั้งเดิมใน Pytorch

การอ้างอิง

 @inproceedings { Eskimez2024E2TE ,
    title   = { E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS } ,
    author  = { Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:270738197 }
}

 @article { Burtsev2021MultiStreamT ,
    title     = { Multi-Stream Transformers } ,
    author    = { Mikhail S. Burtsev and Anna Rumshisky } ,
    journal   = { ArXiv } ,
    year      = { 2021 } ,
    volume    = { abs/2107.10342 } ,
    url       = { https://api.semanticscholar.org/CorpusID:236171087 }
}