zerovox Download - zerovox Source Source Download

zerovox

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

Zerovox: ระบบ TTS แบบเรียลไทม์แบบไม่มีการยิงแบบออฟไลน์ฟรีและโอเพ่นซอร์สอย่างเต็มที่

Zerovox เป็นระบบ Text-to-Speech (TTS) ที่สร้างขึ้นสำหรับการใช้งานแบบเรียลไทม์และฝังตัว

Zerovox ดำเนินการออฟไลน์ทั้งหมดเพื่อให้มั่นใจถึงความเป็นส่วนตัวและความเป็นอิสระจากบริการคลาวด์ เป็นอิสระและโอเพ่นซอร์สอย่างสมบูรณ์เชิญการมีส่วนร่วมและคำแนะนำของชุมชน

แบบจำลองหลังจาก FastSpeech2, Zerovox ก้าวไปอีกขั้นด้วยการโคลนนิ่งลำโพงแบบไม่มีการยิงโดยใช้โทเค็นสไตล์โลก (GST) และการทำให้เป็นมาตรฐานของลำโพง (SCLN) สำหรับการฝังลำโพงที่มีประสิทธิภาพ ระบบรองรับการสร้างคำพูดทั้งภาษาอังกฤษและเยอรมันจากรุ่นเดียวที่ผ่านการฝึกอบรมในชุดข้อมูลที่กว้างขวาง Zerovox เป็นพจนานุกรมการออกเสียงที่ใช้ฟอนิมใช้ประโยชน์จากการใช้คำพูดที่แม่นยำโดยใช้พจนานุกรม CMU สำหรับภาษาอังกฤษและพจนานุกรมที่กำหนดเองสำหรับภาษาเยอรมันจากโครงการ Zamiaspeech ที่ชุดฟอนิมใช้มาจาก

Zerovox สามารถทำหน้าที่เป็นแบ็กเอนด์ TTS สำหรับ LLMS เปิดใช้งานการโต้ตอบแบบเรียลไทม์และเป็นระบบ TTS ที่ติดตั้งง่ายสำหรับระบบอัตโนมัติในบ้านเช่นผู้ช่วยในบ้าน เนื่องจากมันไม่ได้เป็นเรื่องง่ายเช่น FastSpeech2 เอาต์พุตของมันจึงง่ายต่อการควบคุมและคาดการณ์ได้

ใบอนุญาต: Zerovox เป็น Apache 2 ที่ได้รับใบอนุญาตด้วยหลายส่วนที่ใช้ประโยชน์จากโครงการอื่น ๆ (ดูส่วนเครดิตด้านล่าง) ภายใต้ใบอนุญาต MIT

การสาธิต

โปรดทราบ: รุ่นยังอยู่ในเวทีอัลฟ่าและยังคงฝึกอบรม

https://huggingface.co/spaces/goooofy/zerovox-demo

สถิติคลังเสียง

สถิติการฝึกอบรม Zerovox ปัจจุบัน:

 german  audio corpus: 16679 speakers, 475.3 hours audio
english audio corpus: 19899 speakers, 358.7 hours audio

การฝึกอบรมโมเดล Zerovox

การเตรียมข้อมูล

(1/5) เตรียม Corpus Yamls:

 pushd configs/corpora/cv_de_100
./gen_cv.sh
popd

(2/5) เตรียมการจัดตำแหน่ง:

 utils/prepare_align.py configs/corpora/cv_de_100

(3/5) oovs:

 utils/oovtool.py -a -m zerovox-g2p-autoreg-zamia-de configs/corpora/cv_de_100

(4/5) จัดเรียง:

 utils/align.py --kaldi-model=tts_de_kaldi_zamia_4 configs/corpora/cv_de_100

(5/5) ประมวลผลล่วงหน้า:

 utils/preprocess.py configs/corpora/cv_de_100

การฝึกอบรมแบบจำลอง TTS

 utils/train_tts.py 
    --head=2 --reduction=1 --expansion=2 --kernel-size=5 --n-blocks=3 --block-depth=3 
    --accelerator=gpu --threads=24 --batch-size=32 --val_epochs=8 
    --infer-device=cpu 
    --lr=0.0001 --warmup_epochs=25 
    --hifigan-checkpoint=VCTK_V2 
    --out-folder=models/tts_de_zerovox_base_1 
    configs/corpora/cv_de_100 
    configs/corpora/de_hui/de_hui_*.yaml 
    configs/corpora/de_thorsten.yaml

การฝึกอบรมแบบจำลอง Kaldi Accoustic

 utils/train_kaldi.py --model-name=tts_de_kaldi_zamia_4 --num-jobs=12 configs/corpora/cv_de_100

การฝึกอบรมแบบจำลอง G2P

รันฝึกอบรม:

 scripts/train_g2p_de_autoreg.sh

การให้เครดิต

แต่เดิมขึ้นอยู่กับประสิทธิภาพการใช้งานโดย Rowel Atienza

https://github.com/roatienza/efficientspeech

 @inproceedings{atienza2023efficientspeech,
  title={EfficientSpeech: An On-Device Text to Speech Model},
  author={Atienza, Rowel},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

ตัวเข้ารหัสและตัวถอดรหัส FastSpeech2 ถูกยืม (ภายใต้ใบอนุญาต MIT) จากการใช้งานของ Chung-Ming Chien ของ FastSpeech2

https://github.com/ming024/fastspeech2

 @misc{ren2022fastspeech2fasthighquality,
    title={FastSpeech 2: Fast and High-Quality End-to-End Text to Speech}, 
    author={Yi Ren and Chenxu Hu and Xu Tan and Tao Qin and Sheng Zhao and Zhou Zhao and Tie-Yan Liu},
    year={2022},
    eprint={2006.04558},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2006.04558}, 
}

การดำเนินการถอดรหัส MEL นั้นยืมมา (ภายใต้ใบอนุญาต MIT) จากโครงการ ParallelWavegan ของ Tomoki Hayashi:

https://github.com/kan-bayashi/parallelwavegan รุ่นหม้อแปลง G2P นั้นขึ้นอยู่กับ deepphonemizer โดย Axel Springer News Media & Tech Gmbh & Co. KG - ความคิดวิศวกรรม (ใบอนุญาต MIT)

https://github.com/as-ideas/deepphonemizer

 @inproceedings{Yolchuyeva_2019, series={interspeech_2019},
title={Transformer Based Grapheme-to-Phoneme Conversion},
url={http://dx.doi.org/10.21437/Interspeech.2019-1954},
DOI={10.21437/interspeech.2019-1954},
booktitle={Interspeech 2019},
publisher={ISCA},
author={Yolchuyeva, Sevinj and Németh, Géza and Gyires-Tóth, Bálint},
year={2019},
month=sep, pages={2095–2099},
collection={interspeech_2019} }

การเข้ารหัสลำโพงที่ใช้ ZeroShot Resnet ถูกยืม (ภายใต้ใบอนุญาต MIT) จาก voxceleb_trainer โดย Clova AI Research

https://github.com/clovaai/voxceleb_trainer

 @inproceedings{chung2020in,
title={In defence of metric learning for speaker recognition},
author={Chung, Joon Son and Huh, Jaesung and Mun, Seongkyu and Lee, Minjae and Heo, Hee Soo and Choe, Soyeon and Ham, Chiheon and Jung, Sunghwan and Lee, Bong-Jin and Han, Icksang},
booktitle={Proc. Interspeech},
year={2020}
}

@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
pages={770--778},
year={2016}
}

การฝังลำโพงที่ใช้โทเค็นตามสไตล์ของ ZeroShot Global นั้นขึ้นอยู่กับ GST-Tacotron โดย Chengqi Deng (ใบอนุญาต MIT)

https://github.com/kinglittleq/gst-tacotron

ซึ่งเป็นการดำเนินการของ

 @misc{wang2018style,
	  title={Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis},
	  author={Yuxuan Wang and Daisy Stanton and Yu Zhang and RJ Skerry-Ryan and Eric Battenberg and Joel Shor and Ying Xiao and Fei Ren and Ye Jia and Rif A. Saurous},
	  year={2018},
	  eprint={1803.09017},
	  archivePrefix={arXiv},
	  primaryClass={cs.CL}
}

ลำโพงแบบมีเงื่อนไขเลเยอร์การทำให้เป็นมาตรฐาน (SCLN) ซึ่งยืม (ภายใต้ใบอนุญาต MIT) จาก

https://github.com/keonlee9420/cross-speaker-emotion-transfer โดย Keon Lee

 @misc{wu2021crossspeakeremotiontransferbased,
    title={Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech}, 
    author={Pengfei Wu and Junjie Pan and Chenchang Xu and Junhui Zhang and Lin Wu and Xiang Yin and Zejun Ma},
    year={2021},
    eprint={2110.04153},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2110.04153}, 
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-15
ขนาด 27.01MB
มาจาก Github

แอปที่เกี่ยวข้อง

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด