FunCodec下载 - FunCodec源代码下载

FunCodec

Ai源码

1.0.0

下载

FunCodec：神经语音编解码器的基本，可再现和可集成的开源工具包

该项目仍在进行进度。为了使Funcodec变得更好，请让我知道您的担忧，并随时在Issues部分中对它们发表评论。

消息

2023.12.22 ??：我们发布了Lauratts以及预培训模型的培训和推理食谱。劳拉特（Lauratts）是一种强大的基于编解码器的零击文本到语音合成器，它在语义一致性和说话者的相似性方面优于vall-e。有关更多详细信息，请参阅egs/LibriTTS/text2speech_laura/README.md 。

安装

git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./

可用型号

？链接到HuggingFace模型中心，而引用ModelsCope。

模型名称	模型中心	语料库	比特率	参数	拖鞋
audio_codec-codec-zh_en-general-16k-nq32ds640-pytorch	？	一般的	250〜8000	57.83 m	7.73克
audio_codec-engodec-zh_en-general-16k-nq32ds320-pytorch	？	一般的	500〜16000	14.85 m	3.72克
audio_codec-codec-en-libritts-16k-nq32ds640-pytorch	？	库	250〜8000	57.83 m	7.73克
audio_codec-codec-en-libritts-16k-nq32ds320-pytorch	？	库	500〜16000	14.85 m	3.72克
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR8NQ32DS320-PYTORCH	？	库	500〜16000	4.50 m	2.18克
AUDIO_CODEC-FREQCODEC_MAGPHASE-EN-LIBRITTS-16K-GR1NQ32DS320-PYTORCH	？	库	500〜16000	0.52 m	0.34 g

模型下载

从ModelsCope下载模型

请参阅egs/LibriTTS/codec/encoding_decoding.sh以下载验证的模型：

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub modelscope
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

从HuggingFace下载模型

请参阅egs/LibriTTS/codec/encoding_decoding.sh以下载验证的模型：

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub huggingface
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

推理

批次推理

请参阅egs/LibriTTS/codec/encoding_decoding.sh以执行编码和解码。使用输入文件input_wav.scp提取代码，并以JSONL格式将代码保存到output_dir/codecs.txt 。

 cd egs/LibriTTS/codec
bash encoding_decoding.sh --stage 1 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 
  --wav_scp input_wav.scp  --out_dir outputs/codecs/
# input_wav.scp has the following format：
# uttid1 path/to/file1.wav
# uttid2 path/to/file2.wav
# ...

使用输入文件codecs.txt解码代码，重建的波形将保存到output_dir/logdir/output.*/*.wav 。

bash encoding_decoding.sh --stage 2 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 --file_sampling_rate 16000 
  --wav_scp codecs.txt --out_dir outputs/recon_wavs 
# codecs.scp is the output of above encoding stage, which has the following format：
# uttid1 [[[1, 2, 3, ...],[2, 3, 4, ...], ...]]
# uttid2 [[[9, 7, 5, ...],[3, 1, 2, ...], ...]]

训练

开源语料库培训

对于常用的开源语料库，您可以使用egs目录中的食谱培训模型。例如，要在LibriTTS语料库上训练模型，您可以使用egs/LibriTTS/codec/run.sh ：

 # entry the LibriTTS recipe directory
cd egs/LibriTTS/codec
# run data downloading, preparation and training stages with 2 GPUs (device 0 and 1)
bash run.sh --stage 0 --stop_stage 3 --gpu_devices 0,1 --gpu_num 2

我们建议按阶段运行脚本，以概述Funcodec。

定制数据培训

对于发现的Corpora或自定义数据集，您可以自己准备数据。通常，FunCodec使用类似Kaldi的wav.scp文件来组织数据文件。 wav.scp具有以下格式：

 # for waveform files
uttid1 /path/to/uttid1.wav
uttid2 /path/to/uttid2.wav
# for kaldi-ark files
uttid3 /path/to/ark1.wav:10
uttid4 /path/to/ark1.wav:200
uttid5 /path/to/ark2.wav:10

如上面的示例所示，FunCodec支持一个wav.scp文件中的波形或kaldi-ark文件的组合，用于培训和推理。这是一个演示脚本，可以在您的自定义数据集上训练模型，名为foo ：

 cd egs/LibriTTS/codec
# 0. make the directory for train, dev and test sets
mkdir -p dump/foo/train dump/foo/dev dump/foo/test

# 1a. if you already have the wav.scp file, just place them under the corresponding directories
mv train.scp dump/foo/train/ ; mv dev.scp dump/foo/dev/ ; mv test.scp dump/foo/test/ ;
# 1b. if you don't have the wav.scp file, you can prepare it as follows
find path/to/train_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/train/wav.scp
find path/to/dev_set/   -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/dev/wav.scp
find path/to/test_set/  -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/test/wav.scp

# 2. collate shape files
mkdir exp/foo_states/train exp/foo_states/dev
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/train/wav.scp --out_dir exp/foo_states/train/wav_length
cat exp/foo_states/train/wav_length/wav_length. * .txt | shuf > exp/foo_states/train/speech_shape
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/dev/wav.scp --out_dir exp/foo_states/dev/wav_length
cat exp/foo_states/dev/wav_length/wav_length. * .txt | shuf > exp/foo_states/dev/speech_shape

# 3. train the model with 2 GPUs (device 4 and 5) on the customized dataset (foo)
bash run.sh --gpu_devices 4,5 --gpu_num 2 --dumpdir dump/foo --state_dir foo_states

承认

我们的funasr设计一致，包括数据装载机，模型定义等。
我们从Kaldi借了很多代码进行数据准备。
我们从ESPNET借了很多代码。 Funcodec跟随ESPNET的培训和填充管道。
我们从EnocDec和Enocdec_trainner借用了模型架构的设计。

执照

该项目已根据MIT许可获得许可。 FunCodec还包含各种第三方组件，并根据其他开源许可根据其他存储库修改了一些代码。

引用

 @misc { du2023funcodec ,
      title = { FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec } ,
      author = { Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng } ,
      year = { 2023 } ,
      eprint = { 2309.07405 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.Sound }
}