تنزيل efficientspeech - تنزيل رمز مصدر efficientspeech

efficientspeech

كود الذكاء الاصطناعي

efficientspeech-0.2.1

تنزيل

efficientspeech: نص على الجهاز لنموذج الكلام

Afficientspeech ، أو ES لفترة قصيرة ، هو نموذج عصبي فعال إلى الكلام (TTS). يولد طيف MEL بسرعة 104 (MRTF) أو 104 ثانية من الكلام لكل ثانية على RPI4. تتميز نسختها الصغيرة بصمة قدرها 266 ألفًا فقط - حوالي 1 ٪ فقط من TTS في العصر الحديث مثل Mixertts. توليد 6 ثوان من الكلام يستهلك 90 mflops فقط.

ورق

IEEE Xplore
arxiv

النموذج العمارة

efficientspeech هو محول هرم ضحل (2 كتل!) يشبه شبكة U. يتم الانتهاك عن طريق الالتفاف القابل للفصل العمق المنقول.

عرض سريع

ثَبَّتَ

تقوم ES حاليًا بالهجرة إلى Pytorch 2.0 و Lightning 2.0. توقع ميزات غير مستقرة.

 pip install -r requirements.txt

إذا واجهت مشاكل مع Cublas:

 pip uninstall nvidia_cublas_cu11

es tiny

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.ckpt 
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

ملف الإخراج تحت outputs . قم بتشغيل ملف WAV:

 ffplay outputs/fox.wav

بعد تنزيل الأوزان ، يمكن إعادة استخدامه:

 python3 demo.py --checkpoint tiny_eng_266k.ckpt --infer-device cpu  
  --text "In additive color mixing, which is used for displays such as computer screens and televisions, the primary colors are red, green, and blue." 
  --wav-filename color.wav

التشغيل:

 ffplay outputs/color.wav

ES الصغيرة

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/small_eng_952k.ckpt 
  --infer-device cpu  --n-blocks 3 --reduction 2  
  --text "Bees are essential pollinators responsible for fertilizing plants and facilitating the growth of fruits, vegetables, and flowers. Their sophisticated social structures and intricate communication systems make them fascinating and invaluable contributors to ecosystems worldwide." 
  --wav-filename bees.wav

التشغيل:

 ffplay outputs/color-small.wav

قاعدة es

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/base_eng_4M.ckpt 
  --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --infer-device cpu  
  --text "Why do bees have sticky hair?" --wav-filename  bees-base.wav

التشغيل:

 ffplay outputs/bees-base.wav

GPU للاستدلال

ومع نص طويل. على A100 ، يمكن أن يصل هذا إلى RTF> 1،300. الوقت باستخدام --iter 100 خيار.

 python3 demo.py --checkpoint small_eng_952k.ckpt  
  --infer-device cuda  --n-blocks 3 --reduction 2  
  --text "Once upon a time, in a magical forest filled with colorful flowers and sparkling streams, there lived a group of adorable kittens. Their names were Fluffy, Sparkle, and Whiskers. With their soft fur and twinkling eyes, they charmed everyone they met. Every day, they would play together, chasing their tails and pouncing on sunbeams that danced through the trees. Their purrs filled the forest with joy, and all the woodland creatures couldn't help but smile whenever they saw the cute trio. The animals knew that these kittens were truly the epitome of cuteness, bringing happiness wherever they went."   
  --wav-filename cats.wav --iter 100

تجميع وعدد خيارات المواضيع

يتم دعم خيار التجميع باستخدام --compile أثناء التدريب أو الاستدلال. للتدريب ، الوضع المتحمس أسرع. التدريب على النسخة الصغيرة هو حوالي 17 ساعة على A100. للاستدلال ، النسخة المترجمة أسرع. لسبب غير معروف ، يتمثل خيار التجميع في توليد أخطاء عندما --infer-device cuda .

بشكل افتراضي ، يستخدم Pytorch 2.0 128 مؤشر ترابط وحدة المعالجة المركزية (AMD ، 4 في RPI4) مما يسبب التباطؤ أثناء الاستدلال. أثناء الاستدلال ، يوصى بتعيينه على رقم أقل. على سبيل المثال: --threads 24 .

RPI4 الاستدلال

Pytorch 2.0 أبطأ على RPI4. يرجى استخدام الإصدار التجريبي وأوزان نموذج ICASSP2023.

RTF على Pytorch 2.0 هو ~ 1.0. RTF على Pytorch 1.12 هو ~ 1.7.

بدلاً من ذلك ، يرجى استخدام إصدار ONNX:

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.onnx 
  --infer-device cpu  --text "the primary colors are red, green, and blue."  --wav-filename primary.wav

onnx

يدعم فقط طول الصوت الإدخال الثابت. يتم تطبيق الحشو أو الاقتطاع إذا لزم الأمر. تعديل باستخدام- --onnx-insize=<desired valu> طول الصوتي الافتراضي هو 128. على سبيل المثال:

 python3 convert.py --checkpoint tiny_eng_266k.ckpt --onnx tiny_eng_266k.onnx --onnx-insize 256

إعداد مجموعة البيانات

اختر مجلد مجموعة البيانات: على سبيل المثال <data_folder> = /data/tts - دليل حيث سيتم تخزين مجموعة البيانات.

تنزيل ljspeech:

 cd <data_folder>
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar zxvf LJSpeech-1.1.tar.bz2

قم بإعداد مجموعة البيانات: <parent_folder> - حيث تم استنساخ efficientspeech.

 cd <parent_folder>/efficientspeech

تحرير config/LJSpeech/preprocess.yaml :

 >>>>>>>>>>>>>>>>>
path:
  corpus_path: "/data/tts/LJSpeech-1.1"
  lexicon_path: "lexicon/librispeech-lexicon.txt"
  raw_path: "/data/tts/LJSpeech-1.1/wavs"
  preprocessed_path: "./preprocessed_data/LJSpeech"
>>>>>>>>>>>>>>>>

استبدل /data/tts بـ <data_folder> .

قم بتنزيل بيانات المحاذاة إلى preprocessed_data/LJSpeech/TextGrid من هنا.

قم بإعداد مجموعة البيانات:

 python3 prepare_align.py config/LJSpeech/preprocess.yaml

هذا سوف يستغرق ساعة أو نحو ذلك.

لمزيد من المعلومات: تنفيذ Fastspeech2 لإعداد مجموعة البيانات.

يدرب

es tiny

افتراضيا:

--precision=16 . خيارات أخرى: "bf16-mixed", "16-mixed", 16, 32, 64 .
--accelerator=gpu
--infer-device=cuda
--devices=1
شاهد المزيد من الخيارات في utils/tools.py

 python3 train.py

ES الصغيرة

 python3 train.py --n-blocks 3 --reduction 2

قاعدة es

 python3 train.py --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3

مقارنة مع TTS Neural Neural Sota الأخرى

es vs fs2 vs portaspeech vs lightspeech

الاعتمادات

fastspeesh2 جيثب غير رسمي.

اقتباس

إذا وجدت هذا العمل مفيدًا ، فيرجى الاستشهاد:

 @inproceedings{atienza2023efficientspeech,
  title={EfficientSpeech: An On-Device Text to Speech Model},
  author={Atienza, Rowel},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

يوسع

معلومات إضافية

الإصدار efficientspeech-0.2.1
النوع كود الذكاء الاصطناعي
وقت التحديث 2025-08-21
الحجم 4.85MB
من Github

تطبيقات ذات صلة

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
ML stack

كود الذكاء الاصطناعي

1.0.0
awesome free chatgpt

كود الذكاء الاصطناعي

1.0.0
pywin_contextmenu

كود الذكاء الاصطناعي

Version update
Google Dorks

شفرة المصدر الأخرى

1.0
shepherd

شفرة المصدر الأخرى

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

شفرة المصدر الأخرى

v1.1.0-rc-3

أخبار ذات صلة الكل