FunCodec下載 - FunCodec源代碼下載

FunCodec

Ai源碼

1.0.0

下載

FunCodec：神經語音編解碼器的基本，可再現和可集成的開源工具包

該項目仍在進行進度。為了使Funcodec變得更好，請讓我知道您的擔憂，並隨時在Issues部分中對它們發表評論。

消息

2023.12.22 ??：我們發布了Lauratts以及預培訓模型的培訓和推理食譜。勞拉特（Lauratts）是一種強大的基於編解碼器的零擊文本到語音合成器，它在語義一致性和說話者的相似性方面優於vall-e。有關更多詳細信息，請參閱egs/LibriTTS/text2speech_laura/README.md 。

安裝

git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./

可用型號

？鏈接到HuggingFace模型中心，而引用ModelsCope。

模型名稱	模型中心	語料庫	比特率	參數	拖鞋
audio_codec-codec-zh_en-general-16k-nq32ds640-pytorch	？	一般的	250〜8000	57.83 m	7.73克
audio_codec-engodec-zh_en-general-16k-nq32ds320-pytorch	？	一般的	500〜16000	14.85 m	3.72克
audio_codec-codec-en-libritts-16k-nq32ds640-pytorch	？	庫	250〜8000	57.83 m	7.73克
audio_codec-codec-en-libritts-16k-nq32ds320-pytorch	？	庫	500〜16000	14.85 m	3.72克
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR8NQ32DS320-PYTORCH	？	庫	500〜16000	4.50 m	2.18克
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR1NQ32DS320-PYTORCH	？	庫	500〜16000	0.52 m	0.34 g

模型下載

從ModelsCope下載模型

請參閱egs/LibriTTS/codec/encoding_decoding.sh以下載驗證的模型：

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub modelscope
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

從HuggingFace下載模型

請參閱egs/LibriTTS/codec/encoding_decoding.sh以下載驗證的模型：

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub huggingface
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

推理

批次推理

請參閱egs/LibriTTS/codec/encoding_decoding.sh以執行編碼和解碼。使用輸入文件input_wav.scp提取代碼，並以JSONL格式將代碼保存到output_dir/codecs.txt 。

 cd egs/LibriTTS/codec
bash encoding_decoding.sh --stage 1 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 
  --wav_scp input_wav.scp  --out_dir outputs/codecs/
# input_wav.scp has the following format：
# uttid1 path/to/file1.wav
# uttid2 path/to/file2.wav
# ...

使用輸入文件codecs.txt解碼代碼，重建的波形將保存到output_dir/logdir/output.*/*.wav 。

bash encoding_decoding.sh --stage 2 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 --file_sampling_rate 16000 
  --wav_scp codecs.txt --out_dir outputs/recon_wavs 
# codecs.scp is the output of above encoding stage, which has the following format：
# uttid1 [[[1, 2, 3, ...],[2, 3, 4, ...], ...]]
# uttid2 [[[9, 7, 5, ...],[3, 1, 2, ...], ...]]

訓練

開源語料庫培訓

對於常用的開源語料庫，您可以使用egs目錄中的食譜培訓模型。例如，要在LibriTTS語料庫上訓練模型，您可以使用egs/LibriTTS/codec/run.sh ：

 # entry the LibriTTS recipe directory
cd egs/LibriTTS/codec
# run data downloading, preparation and training stages with 2 GPUs (device 0 and 1)
bash run.sh --stage 0 --stop_stage 3 --gpu_devices 0,1 --gpu_num 2

我們建議按階段運行腳本，以概述Funcodec。

定制數據培訓

對於發現的Corpora或自定義數據集，您可以自己準備數據。通常，FunCodec使用類似Kaldi的wav.scp文件來組織數據文件。 wav.scp具有以下格式：

 # for waveform files
uttid1 /path/to/uttid1.wav
uttid2 /path/to/uttid2.wav
# for kaldi-ark files
uttid3 /path/to/ark1.wav:10
uttid4 /path/to/ark1.wav:200
uttid5 /path/to/ark2.wav:10

如上面的示例所示，FunCodec支持一個wav.scp文件中的波形或kaldi-ark文件的組合，用於培訓和推理。這是一個演示腳本，可以在您的自定義數據集上訓練模型，名為foo ：

 cd egs/LibriTTS/codec
# 0. make the directory for train, dev and test sets
mkdir -p dump/foo/train dump/foo/dev dump/foo/test

# 1a. if you already have the wav.scp file, just place them under the corresponding directories
mv train.scp dump/foo/train/ ; mv dev.scp dump/foo/dev/ ; mv test.scp dump/foo/test/ ;
# 1b. if you don't have the wav.scp file, you can prepare it as follows
find path/to/train_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/train/wav.scp
find path/to/dev_set/   -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/dev/wav.scp
find path/to/test_set/  -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/test/wav.scp

# 2. collate shape files
mkdir exp/foo_states/train exp/foo_states/dev
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/train/wav.scp --out_dir exp/foo_states/train/wav_length
cat exp/foo_states/train/wav_length/wav_length. * .txt | shuf > exp/foo_states/train/speech_shape
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/dev/wav.scp --out_dir exp/foo_states/dev/wav_length
cat exp/foo_states/dev/wav_length/wav_length. * .txt | shuf > exp/foo_states/dev/speech_shape

# 3. train the model with 2 GPUs (device 4 and 5) on the customized dataset (foo)
bash run.sh --gpu_devices 4,5 --gpu_num 2 --dumpdir dump/foo --state_dir foo_states

承認

我們的funasr設計一致，包括數據裝載機，模型定義等。
我們從Kaldi借了很多代碼進行數據準備。
我們從ESPNET借了很多代碼。 Funcodec跟隨ESPNET的培訓和填充管道。
我們從EnocDec和Enocdec_trainner借用了模型架構的設計。

執照

該項目已根據MIT許可獲得許可。 FunCodec還包含各種第三方組件，並根據其他開源許可根據其他存儲庫修改了一些代碼。

引用

 @misc { du2023funcodec ,
      title = { FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec } ,
      author = { Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng } ,
      year = { 2023 } ,
      eprint = { 2309.07405 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.Sound }
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-08-21
大小 1.25MB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部