Download efficientspeech - Download Kode Sumber efficientspeech

efficientspeech

Kode Sumber AI

efficientspeech-0.2.1

Unduh

Efficientspeech: Teks di perangkat untuk model bicara

Efisienspeech , atau ES singkat, adalah model saraf yang efisien untuk berbicara (TTS) model. Ini menghasilkan spektrogram MEL dengan kecepatan 104 (MRTF) atau 104 detik pidato per detik pada RPI4. Versi kecilnya memiliki jejak hanya 266 ribu parameter - sekitar 1% hanya dari TT modern seperti Mixertts. Menghasilkan 6 detik pidato hanya mengkonsumsi 90 MFLOPS.

Kertas

IEEE Xplore
Arxiv

Arsitektur Model

Efficientspeech adalah transformator piramida yang dangkal (2 blok!) Menyerupai n-net. Upsampling dilakukan oleh konvolusi terpisah yang dapat dipisahkan oleh kedalaman yang ditransfer.

Demo cepat

Memasang

ES saat ini bermigrasi ke Pytorch 2.0 dan Lightning 2.0. Harapkan fitur yang tidak stabil.

 pip install -r requirements.txt

Jika Anda mengalami masalah dengan cublas:

 pip uninstall nvidia_cublas_cu11

Es Tiny

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.ckpt 
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

File output berada di bawah outputs . Mainkan file WAV:

 ffplay outputs/fox.wav

Setelah mengunduh bobot, itu dapat digunakan kembali:

 python3 demo.py --checkpoint tiny_eng_266k.ckpt --infer-device cpu  
  --text "In additive color mixing, which is used for displays such as computer screens and televisions, the primary colors are red, green, and blue." 
  --wav-filename color.wav

Pemutaran:

 ffplay outputs/color.wav

Es kecil

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/small_eng_952k.ckpt 
  --infer-device cpu  --n-blocks 3 --reduction 2  
  --text "Bees are essential pollinators responsible for fertilizing plants and facilitating the growth of fruits, vegetables, and flowers. Their sophisticated social structures and intricate communication systems make them fascinating and invaluable contributors to ecosystems worldwide." 
  --wav-filename bees.wav

Pemutaran:

 ffplay outputs/color-small.wav

Dasar es

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/base_eng_4M.ckpt 
  --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --infer-device cpu  
  --text "Why do bees have sticky hair?" --wav-filename  bees-base.wav

Pemutaran:

 ffplay outputs/bees-base.wav

GPU untuk inferensi

Dan dengan teks yang panjang. Pada A100, ini dapat mencapai RTF> 1.300. Waktu menggunakan opsi --iter 100 .

 python3 demo.py --checkpoint small_eng_952k.ckpt  
  --infer-device cuda  --n-blocks 3 --reduction 2  
  --text "Once upon a time, in a magical forest filled with colorful flowers and sparkling streams, there lived a group of adorable kittens. Their names were Fluffy, Sparkle, and Whiskers. With their soft fur and twinkling eyes, they charmed everyone they met. Every day, they would play together, chasing their tails and pouncing on sunbeams that danced through the trees. Their purrs filled the forest with joy, and all the woodland creatures couldn't help but smile whenever they saw the cute trio. The animals knew that these kittens were truly the epitome of cuteness, bringing happiness wherever they went."   
  --wav-filename cats.wav --iter 100

Kompilasi dan jumlah opsi utas

Opsi yang dikompilasi didukung menggunakan --compile selama pelatihan atau inferensi. Untuk pelatihan, mode yang bersemangat lebih cepat. Pelatihan versi kecil ~ 17 jam pada A100. Untuk inferensi, versi yang dikompilasi lebih cepat. Untuk alasan yang tidak diketahui, opsi kompilasi adalah menghasilkan kesalahan saat- --infer-device cuda .

Secara default, Pytorch 2.0 menggunakan 128 utas CPU (AMD, 4 di RPI4) yang menyebabkan perlambatan selama inferensi. Selama inferensi, disarankan untuk mengaturnya ke angka yang lebih rendah. Misalnya: --threads 24 .

Inferensi RPI4

Pytorch 2.0 lebih lambat pada RPI4. Harap gunakan rilis demo dan bobot model ICASSP2023.

RTF di Pytorch 2.0 adalah ~ 1.0. RTF pada Pytorch 1.12 adalah ~ 1.7.

Atau, silakan gunakan versi ONNX:

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.onnx 
  --infer-device cpu  --text "the primary colors are red, green, and blue."  --wav-filename primary.wav

Onnx

Hanya mendukung panjang fonem input tetap. Padding atau pemotongan diterapkan jika diperlukan. Ubah menggunakan --onnx-insize=<desired valu> . Panjang fonem maks default adalah 128. Misalnya:

 python3 convert.py --checkpoint tiny_eng_266k.ckpt --onnx tiny_eng_266k.onnx --onnx-insize 256

Persiapan dataset

Pilih folder dataset: misalnya <data_folder> = /data/tts - direktori di mana dataset akan disimpan.

Unduh ljspeech:

 cd <data_folder>
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar zxvf LJSpeech-1.1.tar.bz2

Siapkan dataset: <parent_folder> - Di mana EfficientSpeech dikloning git.

 cd <parent_folder>/efficientspeech

Edit config/LJSpeech/preprocess.yaml :

 >>>>>>>>>>>>>>>>>
path:
  corpus_path: "/data/tts/LJSpeech-1.1"
  lexicon_path: "lexicon/librispeech-lexicon.txt"
  raw_path: "/data/tts/LJSpeech-1.1/wavs"
  preprocessed_path: "./preprocessed_data/LJSpeech"
>>>>>>>>>>>>>>>>

Ganti /data/tts dengan <data_folder> Anda.

Unduh Data Alignment ke preprocessed_data/LJSpeech/TextGrid dari sini.

Siapkan dataset:

 python3 prepare_align.py config/LJSpeech/preprocess.yaml

Ini akan memakan waktu satu jam atau lebih.

Untuk info lebih lanjut: Implementasi FastSpeech2 untuk menyiapkan dataset.

Kereta

Es Tiny

Secara default:

--precision=16 . Opsi lain: "bf16-mixed", "16-mixed", 16, 32, 64 .
--accelerator=gpu
--infer-device=cuda
--devices=1
Lihat lebih banyak opsi di utils/tools.py

 python3 train.py

Es kecil

 python3 train.py --n-blocks 3 --reduction 2

Dasar es

 python3 train.py --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3

Perbandingan dengan TTS saraf sota lainnya

ES VS FS2 vs Portaspeech vs Lightspeech

Kredit

Fastspeech2 github tidak resmi.

Kutipan

Jika Anda menemukan pekerjaan ini bermanfaat, silakan kutip:

 @inproceedings{atienza2023efficientspeech,
  title={EfficientSpeech: An On-Device Text to Speech Model},
  author={Atienza, Rowel},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Memperluas

Informasi Tambahan

Versi efficientspeech-0.2.1
Tipe Kode Sumber AI
Waktu Pembaruan 2025-08-21
ukuran 4.85MB
Berasal dari Github

Aplikasi Terkait

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
ML stack

Kode Sumber AI

1.0.0
awesome free chatgpt

Kode Sumber AI

1.0.0
pywin_contextmenu

Kode Sumber AI

Version update
Google Dorks

Kode sumber lainnya

1.0
shepherd

Kode sumber lainnya

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Kode sumber lainnya

v1.1.0-rc-3

Informasi Terkait Semua