ดาวน์โหลด FunCodec - ดาวน์โหลดซอร์สโค้ด FunCodec

FunCodec

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

FunCodec: ชุดเครื่องมือโอเพนซอร์ซแบบเปิดโล่งและแบบบูรณาการสำหรับตัวแปลงสัญญาณการพูดของระบบประสาท

โครงการนี้ยังคงดำเนินการเกี่ยวกับความคืบหน้า เพื่อให้ FunCodec ดีขึ้นโปรดแจ้งให้เราทราบข้อกังวลของคุณและอย่าลังเลที่จะแสดงความคิดเห็นในส่วน Issues

ข่าว

2023.12.22 ??: เราเปิดตัวสูตรการฝึกอบรมและการอนุมานสำหรับลอเรตส์เช่นเดียวกับโมเดลที่ผ่านการฝึกอบรมมาก่อน Lauratts เป็น synthesizer แบบ zero-shot-shot-speech synthesizer ซึ่งมีประสิทธิภาพสูงกว่า Vall-E ในแง่ของความสอดคล้องทางความหมายและความคล้ายคลึงกันของลำโพง โปรดอ้างอิง egs/LibriTTS/text2speech_laura/README.md สำหรับรายละเอียดเพิ่มเติม

การติดตั้ง

git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./

รุ่นที่มีอยู่

- ลิงก์ไปยังฮับโมเดล HuggingFace ในขณะที่หมายถึง ModelsCope

ชื่อนางแบบ	ฮับรุ่น	คอร์ปัส	บิตเรต	พารามิเตอร์	ความกระฉับกระเฉง
AUDIO_CODEC-ENCODEC-ZH_EN-GENERAL-16K-NQ32DS640-PYTORCH	-	ทั่วไป	250 ~ 8000	57.83 ม.	7.73g
AUDIO_CODEC-ENCODEC-ZH_EN-GENERAL-16K-NQ32DS320-PYTORCH	-	ทั่วไป	500 ~ 16000	14.85 ม.	3.72 กรัม
AUDIO_CODEC-ENCODEC-EN-LIBRITTS-16K-NQ32DS640-PYTORCH	-	ห้องสมุด	250 ~ 8000	57.83 ม.	7.73g
AUDIO_CODEC-ENCODEC-EN-LIBRITTS-16K-NQ32DS320-PYTORCH	-	ห้องสมุด	500 ~ 16000	14.85 ม.	3.72 กรัม
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR8NQ32DS320-PYTORCH	-	ห้องสมุด	500 ~ 16000	4.50 ม.	2.18 กรัม
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR1NQ32DS320-PYTORCH	-	ห้องสมุด	500 ~ 16000	0.52 ม.	0.34 กรัม

ดาวน์โหลดรุ่น

ดาวน์โหลดรุ่นจาก ModelsCope

โปรดอ้างอิง egs/LibriTTS/codec/encoding_decoding.sh เพื่อดาวน์โหลดรุ่น pretrained:

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub modelscope
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

ดาวน์โหลดรุ่นจาก HuggingFace

โปรดอ้างอิง egs/LibriTTS/codec/encoding_decoding.sh เพื่อดาวน์โหลดรุ่น pretrained:

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub huggingface
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

การอนุมาน

การอนุมานแบทช์

โปรดอ้างอิง egs/LibriTTS/codec/encoding_decoding.sh เพื่อทำการเข้ารหัสและถอดรหัส แยกรหัสด้วยอินพุตไฟล์ input_wav.scp และรหัสจะถูกบันทึกลงใน output_dir/codecs.txt ในรูปแบบของ JSONL

 cd egs/LibriTTS/codec
bash encoding_decoding.sh --stage 1 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 
  --wav_scp input_wav.scp  --out_dir outputs/codecs/
# input_wav.scp has the following format：
# uttid1 path/to/file1.wav
# uttid2 path/to/file2.wav
# ...

รหัสถอดรหัสด้วย codecs.txt ของไฟล์อินพุตและรูปคลื่นที่สร้างขึ้นใหม่จะถูกบันทึกลงใน output_dir/logdir/output.*/*.wav

bash encoding_decoding.sh --stage 2 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 --file_sampling_rate 16000 
  --wav_scp codecs.txt --out_dir outputs/recon_wavs 
# codecs.scp is the output of above encoding stage, which has the following format：
# uttid1 [[[1, 2, 3, ...],[2, 3, 4, ...], ...]]
# uttid2 [[[9, 7, 5, ...],[3, 1, 2, ...], ...]]

การฝึกอบรม

การฝึกอบรมเกี่ยวกับโอเพนซอร์ซ Corpora

สำหรับ Corpora โอเพ่นซอร์สที่ใช้กันทั่วไปคุณสามารถฝึกอบรมแบบจำลองโดยใช้สูตรในไดเรกทอรี egs ตัวอย่างเช่นในการฝึกอบรมโมเดลบนคลังข้อมูล LibriTTS คุณสามารถใช้ egs/LibriTTS/codec/run.sh :

 # entry the LibriTTS recipe directory
cd egs/LibriTTS/codec
# run data downloading, preparation and training stages with 2 GPUs (device 0 and 1)
bash run.sh --stage 0 --stop_stage 3 --gpu_devices 0,1 --gpu_num 2

เราขอแนะนำให้เรียกใช้สคริปต์เวทีทีละขั้นเพื่อให้มีภาพรวมของ FunCodec

การฝึกอบรมเกี่ยวกับข้อมูลที่กำหนดเอง

สำหรับ Corpora ที่เปิดโปงหรือชุดข้อมูลที่กำหนดเองคุณสามารถเตรียมข้อมูลด้วยตัวเอง โดยทั่วไป FunCodec ใช้ไฟล์ wav.scp ที่มีลักษณะคล้าย Kaldi เพื่อจัดระเบียบไฟล์ข้อมูล wav.scp มีรูปแบบต่อไปนี้:

 # for waveform files
uttid1 /path/to/uttid1.wav
uttid2 /path/to/uttid2.wav
# for kaldi-ark files
uttid3 /path/to/ark1.wav:10
uttid4 /path/to/ark1.wav:200
uttid5 /path/to/ark2.wav:10

ดังที่แสดงในตัวอย่างข้างต้น FunCodec รองรับการรวมกันของไฟล์ waveforms หรือ kaldi-ark ในไฟล์ wav.scp หนึ่งไฟล์สำหรับทั้งการฝึกอบรมและการอนุมาน นี่คือสคริปต์ตัวอย่างเพื่อฝึกอบรมโมเดลในชุดข้อมูลที่คุณกำหนดเองชื่อ foo :

 cd egs/LibriTTS/codec
# 0. make the directory for train, dev and test sets
mkdir -p dump/foo/train dump/foo/dev dump/foo/test

# 1a. if you already have the wav.scp file, just place them under the corresponding directories
mv train.scp dump/foo/train/ ; mv dev.scp dump/foo/dev/ ; mv test.scp dump/foo/test/ ;
# 1b. if you don't have the wav.scp file, you can prepare it as follows
find path/to/train_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/train/wav.scp
find path/to/dev_set/   -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/dev/wav.scp
find path/to/test_set/  -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/test/wav.scp

# 2. collate shape files
mkdir exp/foo_states/train exp/foo_states/dev
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/train/wav.scp --out_dir exp/foo_states/train/wav_length
cat exp/foo_states/train/wav_length/wav_length. * .txt | shuf > exp/foo_states/train/speech_shape
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/dev/wav.scp --out_dir exp/foo_states/dev/wav_length
cat exp/foo_states/dev/wav_length/wav_length. * .txt | shuf > exp/foo_states/dev/speech_shape

# 3. train the model with 2 GPUs (device 4 and 5) on the customized dataset (foo)
bash run.sh --gpu_devices 4,5 --gpu_num 2 --dumpdir dump/foo --state_dir foo_states

รับทราบ

เรามีการออกแบบที่สอดคล้องกันของ funasr รวมถึง dataloader, คำจำกัดความของรุ่นและอื่น ๆ
เรายืมรหัสจำนวนมากจาก Kaldi เพื่อเตรียมข้อมูล
เรายืมรหัสจำนวนมากจาก ESPNET Funcodec ติดตามการฝึกอบรมและการจัดส่งท่อส่งของ ESPNET
เรายืมการออกแบบสถาปัตยกรรมโมเดลจาก Enocdec และ ENOCDEC_TRAINNER

ใบอนุญาต

โครงการนี้ได้รับใบอนุญาตภายใต้ใบอนุญาต MIT Funcodec ยังมีส่วนประกอบของบุคคลที่สามและรหัสบางส่วนที่แก้ไขจาก repos อื่น ๆ ภายใต้ใบอนุญาตโอเพนซอร์สอื่น ๆ

การอ้างอิง

 @misc { du2023funcodec ,
      title = { FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec } ,
      author = { Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng } ,
      year = { 2023 } ,
      eprint = { 2309.07405 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.Sound }
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-08-21
ขนาด 1.25MB
มาจาก Github

แอปที่เกี่ยวข้อง

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด