efficientspeech下載 - efficientspeech源源代碼下載

efficientspeech

Ai源碼

efficientspeech-0.2.1

下載

功效私語：語音模型的設備文本

效果或簡稱ES是語音（TTS）模型的有效神經文本。它在RPI4上以104（MRTF）或104秒語音的速度生成MEL頻譜圖。它的微小版本的足跡僅為266K參數 - 僅在現代TTS（例如Mixertts）中約有1％。產生6秒的語音僅消耗90個Mflops。

紙

IEEE Xplore
arxiv

模型架構

功效傾斜是一個類似於U-NET的淺（2個塊！）金字塔變壓器。通過轉移的深度可分離卷積來進行上採樣。

快速演示

安裝

ES目前正在遷移到Pytorch 2.0和Lightning 2.0。期望不穩定的功能。

 pip install -r requirements.txt

如果您遇到了Cublas的問題：

 pip uninstall nvidia_cublas_cu11

微小的ES

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.ckpt 
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

輸出文件在outputs下。播放WAV文件：

 ffplay outputs/fox.wav

下載權重後，可以重複使用：

 python3 demo.py --checkpoint tiny_eng_266k.ckpt --infer-device cpu  
  --text "In additive color mixing, which is used for displays such as computer screens and televisions, the primary colors are red, green, and blue." 
  --wav-filename color.wav

播放：

 ffplay outputs/color.wav

小ES

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/small_eng_952k.ckpt 
  --infer-device cpu  --n-blocks 3 --reduction 2  
  --text "Bees are essential pollinators responsible for fertilizing plants and facilitating the growth of fruits, vegetables, and flowers. Their sophisticated social structures and intricate communication systems make them fascinating and invaluable contributors to ecosystems worldwide." 
  --wav-filename bees.wav

播放：

 ffplay outputs/color-small.wav

基礎ES

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/base_eng_4M.ckpt 
  --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --infer-device cpu  
  --text "Why do bees have sticky hair?" --wav-filename  bees-base.wav

播放：

 ffplay outputs/bees-base.wav

GPU進行推理

並帶有長篇文字。在A100上，可以達到RTF> 1,300。時間使用--iter 100選項。

 python3 demo.py --checkpoint small_eng_952k.ckpt  
  --infer-device cuda  --n-blocks 3 --reduction 2  
  --text "Once upon a time, in a magical forest filled with colorful flowers and sparkling streams, there lived a group of adorable kittens. Their names were Fluffy, Sparkle, and Whiskers. With their soft fur and twinkling eyes, they charmed everyone they met. Every day, they would play together, chasing their tails and pouncing on sunbeams that danced through the trees. Their purrs filled the forest with joy, and all the woodland creatures couldn't help but smile whenever they saw the cute trio. The animals knew that these kittens were truly the epitome of cuteness, bringing happiness wherever they went."   
  --wav-filename cats.wav --iter 100

編譯和數量的線程選項

在培訓或推理期間使用--compile支持編譯的選項。對於訓練，急切的模式更快。 A100的小型版本培訓約為17小時。對於推斷，編譯版本更快。出於未知原因，編譯選項是在--infer-device cuda時生成錯誤。

默認情況下，Pytorch 2.0使用128個CPU線程（RPI4中的AMD，4，4），這會在推理過程中降低。在推斷期間，建議將其設置為較低的數字。例如： --threads 24 。

RPI4推斷

Pytorch 2.0在RPI4上較慢。請使用演示版本和ICASSP2023型號權重。

Pytorch 2.0上的RTF為〜1.0。 pytorch 1.12上的RTF為〜1.7。

另外，請使用ONNX版本：

 python3 demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/pytorch2.0.1/tiny_eng_266k.onnx 
  --infer-device cpu  --text "the primary colors are red, green, and blue."  --wav-filename primary.wav

onnx

僅支持固定的輸入音素長度。如果需要，將填充或截斷。使用--onnx-insize=<desired valu>修改。默認最大音素長度為128。例如：

 python3 convert.py --checkpoint tiny_eng_266k.ckpt --onnx tiny_eng_266k.onnx --onnx-insize 256

數據集準備

選擇一個數據集文件夾：例如<data_folder> = /data/tts - 將存儲數據集的目錄。

下載ljspeech：

 cd <data_folder>
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar zxvf LJSpeech-1.1.tar.bz2

準備數據集： <parent_folder> - git克隆的效率上的位置。

 cd <parent_folder>/efficientspeech

編輯config/LJSpeech/preprocess.yaml ：

 >>>>>>>>>>>>>>>>>
path:
  corpus_path: "/data/tts/LJSpeech-1.1"
  lexicon_path: "lexicon/librispeech-lexicon.txt"
  raw_path: "/data/tts/LJSpeech-1.1/wavs"
  preprocessed_path: "./preprocessed_data/LJSpeech"
>>>>>>>>>>>>>>>>

用您的<data_folder>替換/data/tts 。

從此處下載到preprocessed_data/LJSpeech/TextGrid對齊數據。

準備數據集：

 python3 prepare_align.py config/LJSpeech/preprocess.yaml

這將需要一個小時左右。

有關更多信息：FastSpeech2實現以準備數據集。

火車

微小的ES

默認情況下：

--precision=16 。其他選項： "bf16-mixed", "16-mixed", 16, 32, 64 。
--accelerator=gpu
--infer-device=cuda
--devices=1
在utils/tools.py中查看更多選項。

 python3 train.py

小ES

 python3 train.py --n-blocks 3 --reduction 2

基礎ES

 python3 train.py --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3

與其他SOTA神經TT進行比較

ES VS FS2 VS Portaspeech vs LightSpeech

學分

FastSpeech2非官方的GitHub。

引用

如果您發現這項工作有用，請引用：

 @inproceedings{atienza2023efficientspeech,
  title={EfficientSpeech: An On-Device Text to Speech Model},
  author={Atienza, Rowel},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

展開

附加信息

版本 efficientspeech-0.2.1
類型 Ai源碼
更新時間 2025-08-21
大小 4.85MB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部