FunCodec
1.0.0
該項目仍在進行進度。為了使Funcodec變得更好,請讓我知道您的擔憂,並隨時在Issues部分中對它們發表評論。
egs/LibriTTS/text2speech_laura/README.md 。 git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./?鏈接到HuggingFace模型中心,而引用ModelsCope。
| 模型名稱 | 模型中心 | 語料庫 | 比特率 | 參數 | 拖鞋 |
|---|---|---|---|---|---|
| audio_codec-codec-zh_en-general-16k-nq32ds640-pytorch | ? | 一般的 | 250〜8000 | 57.83 m | 7.73克 |
| audio_codec-engodec-zh_en-general-16k-nq32ds320-pytorch | ? | 一般的 | 500〜16000 | 14.85 m | 3.72克 |
| audio_codec-codec-en-libritts-16k-nq32ds640-pytorch | ? | 庫 | 250〜8000 | 57.83 m | 7.73克 |
| audio_codec-codec-en-libritts-16k-nq32ds320-pytorch | ? | 庫 | 500〜16000 | 14.85 m | 3.72克 |
| AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR8NQ32DS320-PYTORCH | ? | 庫 | 500〜16000 | 4.50 m | 2.18克 |
| AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR1NQ32DS320-PYTORCH | ? | 庫 | 500〜16000 | 0.52 m | 0.34 g |
請參閱egs/LibriTTS/codec/encoding_decoding.sh以下載驗證的模型:
cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub modelscope
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch請參閱egs/LibriTTS/codec/encoding_decoding.sh以下載驗證的模型:
cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub huggingface
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch 請參閱egs/LibriTTS/codec/encoding_decoding.sh以執行編碼和解碼。使用輸入文件input_wav.scp提取代碼,並以JSONL格式將代碼保存到output_dir/codecs.txt 。
cd egs/LibriTTS/codec
bash encoding_decoding.sh --stage 1 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 "
--model_dir exp/ ${model_name} --bit_width 16000
--wav_scp input_wav.scp --out_dir outputs/codecs/
# input_wav.scp has the following format:
# uttid1 path/to/file1.wav
# uttid2 path/to/file2.wav
# ...使用輸入文件codecs.txt解碼代碼,重建的波形將保存到output_dir/logdir/output.*/*.wav 。
bash encoding_decoding.sh --stage 2 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 "
--model_dir exp/ ${model_name} --bit_width 16000 --file_sampling_rate 16000
--wav_scp codecs.txt --out_dir outputs/recon_wavs
# codecs.scp is the output of above encoding stage, which has the following format:
# uttid1 [[[1, 2, 3, ...],[2, 3, 4, ...], ...]]
# uttid2 [[[9, 7, 5, ...],[3, 1, 2, ...], ...]] 對於常用的開源語料庫,您可以使用egs目錄中的食譜培訓模型。例如,要在LibriTTS語料庫上訓練模型,您可以使用egs/LibriTTS/codec/run.sh :
# entry the LibriTTS recipe directory
cd egs/LibriTTS/codec
# run data downloading, preparation and training stages with 2 GPUs (device 0 and 1)
bash run.sh --stage 0 --stop_stage 3 --gpu_devices 0,1 --gpu_num 2我們建議按階段運行腳本,以概述Funcodec。
對於發現的Corpora或自定義數據集,您可以自己準備數據。通常,FunCodec使用類似Kaldi的wav.scp文件來組織數據文件。 wav.scp具有以下格式:
# for waveform files
uttid1 /path/to/uttid1.wav
uttid2 /path/to/uttid2.wav
# for kaldi-ark files
uttid3 /path/to/ark1.wav:10
uttid4 /path/to/ark1.wav:200
uttid5 /path/to/ark2.wav:10如上面的示例所示,FunCodec支持一個wav.scp文件中的波形或kaldi-ark文件的組合,用於培訓和推理。這是一個演示腳本,可以在您的自定義數據集上訓練模型,名為foo :
cd egs/LibriTTS/codec
# 0. make the directory for train, dev and test sets
mkdir -p dump/foo/train dump/foo/dev dump/foo/test
# 1a. if you already have the wav.scp file, just place them under the corresponding directories
mv train.scp dump/foo/train/ ; mv dev.scp dump/foo/dev/ ; mv test.scp dump/foo/test/ ;
# 1b. if you don't have the wav.scp file, you can prepare it as follows
find path/to/train_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/train/wav.scp
find path/to/dev_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/dev/wav.scp
find path/to/test_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/test/wav.scp
# 2. collate shape files
mkdir exp/foo_states/train exp/foo_states/dev
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/train/wav.scp --out_dir exp/foo_states/train/wav_length
cat exp/foo_states/train/wav_length/wav_length. * .txt | shuf > exp/foo_states/train/speech_shape
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/dev/wav.scp --out_dir exp/foo_states/dev/wav_length
cat exp/foo_states/dev/wav_length/wav_length. * .txt | shuf > exp/foo_states/dev/speech_shape
# 3. train the model with 2 GPUs (device 4 and 5) on the customized dataset (foo)
bash run.sh --gpu_devices 4,5 --gpu_num 2 --dumpdir dump/foo --state_dir foo_states該項目已根據MIT許可獲得許可。 FunCodec還包含各種第三方組件,並根據其他開源許可根據其他存儲庫修改了一些代碼。
@misc { du2023funcodec ,
title = { FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec } ,
author = { Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng } ,
year = { 2023 } ,
eprint = { 2309.07405 } ,
archivePrefix = { arXiv } ,
primaryClass = { cs.Sound }
}