FunCodec 다운로드 - FunCodec 소스 코드 다운로드

FunCodec

AI 소스 코드

1.0.0

다운로드

FUNCODEC : 신경 음성 코덱을위한 기본적이고 재현 가능하며 통합 가능한 오픈 소스 툴킷

이 프로젝트는 여전히 진행 중입니다. funcodec을 더 좋게하려면 우려 사항을 알려 주시고 Issues 부분에서 자유롭게 의견을 제시하십시오.

소식

2023.12.22 ?? : 우리는 미리 훈련 된 모델뿐만 아니라 Lauratts의 훈련 및 추론 레시피를 발표합니다. Lauratts는 강력한 코덱 기반의 제로 샷 텍스트 음성 연설 신디사이저로, 시맨틱 일관성 및 스피커 유사성 측면에서 Vall-E를 능가합니다. 자세한 내용은 egs/LibriTTS/text2speech_laura/README.md 참조하십시오.

설치

git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./

사용 가능한 모델

? Huggingface 모델 허브에 대한 링크는 modelscope를 나타냅니다.

모델 이름	모델 허브	신체	비트 레이트	매개 변수	플롭
audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch	?	일반적인	250 ~ 8000	57.83 m	7.73g
audio_codec-encodec-zh_en-general-16k-nq32ds320-pytorch	?	일반적인	500 ~ 16000	14.85 m	3.72 g
audio_codec-encodec-en-libritts-16k-nq32ds640-pytorch	?	도서관	250 ~ 8000	57.83 m	7.73g
audio_codec-encodec-en-libritts-16k-nq32ds320-pytorch	?	도서관	500 ~ 16000	14.85 m	3.72 g
audio_codec-freqcodec_magphase-en-libritts-16k-gr8nq32ds320-pytorch	?	도서관	500 ~ 16000	4.50m	2.18 g
audio_codec-freqcodec_magphase-en-libritts-16k-gr1nq32ds320-pytorch	?	도서관	500 ~ 16000	0.52 m	0.34 g

모델 다운로드

ModelScope에서 모델을 다운로드합니다

사전 모델을 다운로드하려면 egs/LibriTTS/codec/encoding_decoding.sh 참조하십시오.

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub modelscope
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

Huggingface에서 모델을 다운로드하십시오

사전 모델을 다운로드하려면 egs/LibriTTS/codec/encoding_decoding.sh 참조하십시오.

 cd egs/LibriTTS/codec
model_name=audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch
bash encoding_decoding.sh --stage 0 --model_name ${model_name} --model_hub huggingface
# The pre-trained model will be downloaded to exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch

추론

배치 추론

인코딩 및 디코딩을 수행하려면 egs/LibriTTS/codec/encoding_decoding.sh 참조하십시오. 입력 파일 input_wav.scp 가있는 코드를 추출하면 코드는 JSONL 형식으로 output_dir/codecs.txt 에 저장됩니다.

 cd egs/LibriTTS/codec
bash encoding_decoding.sh --stage 1 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 
  --wav_scp input_wav.scp  --out_dir outputs/codecs/
# input_wav.scp has the following format：
# uttid1 path/to/file1.wav
# uttid2 path/to/file2.wav
# ...

입력 파일 codecs.txt 사용하여 코드를 디코딩하고 재구성 된 파형은 output_dir/logdir/output.*/*.wav .

bash encoding_decoding.sh --stage 2 --batch_size 16 --num_workers 4 --gpu_devices " 0,1 " 
  --model_dir exp/ ${model_name} --bit_width 16000 --file_sampling_rate 16000 
  --wav_scp codecs.txt --out_dir outputs/recon_wavs 
# codecs.scp is the output of above encoding stage, which has the following format：
# uttid1 [[[1, 2, 3, ...],[2, 3, 4, ...], ...]]
# uttid2 [[[9, 7, 5, ...],[3, 1, 2, ...], ...]]

훈련

오픈 소스 Corpora에 대한 교육

일반적으로 사용되는 오픈 소스 Corpora의 경우 egs 디렉토리의 레시피를 사용하여 모델을 교육 할 수 있습니다. 예를 들어, LibriTTS 코퍼스에서 모델을 훈련 시키려면 egs/LibriTTS/codec/run.sh 사용할 수 있습니다.

 # entry the LibriTTS recipe directory
cd egs/LibriTTS/codec
# run data downloading, preparation and training stages with 2 GPUs (device 0 and 1)
bash run.sh --stage 0 --stop_stage 3 --gpu_devices 0,1 --gpu_num 2

funcodec의 개요를 위해 스크립트 단계를 단계별로 실행하는 것이 좋습니다.

맞춤형 데이터에 대한 교육

발표되지 않은 Corpora 또는 사용자 정의 데이터 세트의 경우 직접 데이터를 준비 할 수 있습니다. 일반적으로 FUNCODEC은 Kaldi-Like wav.scp 파일을 사용하여 데이터 파일을 구성합니다. wav.scp 다음 형식을 가지고 있습니다.

 # for waveform files
uttid1 /path/to/uttid1.wav
uttid2 /path/to/uttid2.wav
# for kaldi-ark files
uttid3 /path/to/ark1.wav:10
uttid4 /path/to/ark1.wav:200
uttid5 /path/to/ark2.wav:10

위의 예에서 볼 수 있듯이 FUNCODEC은 훈련 및 추론 모두에 대해 하나의 wav.scp 파일의 파형 또는 Kaldi-ARAK 파일의 조합을 지원합니다. 다음은 foo 라는 사용자 정의 데이터 세트에서 모델을 훈련시키는 데모 스크립트입니다.

 cd egs/LibriTTS/codec
# 0. make the directory for train, dev and test sets
mkdir -p dump/foo/train dump/foo/dev dump/foo/test

# 1a. if you already have the wav.scp file, just place them under the corresponding directories
mv train.scp dump/foo/train/ ; mv dev.scp dump/foo/dev/ ; mv test.scp dump/foo/test/ ;
# 1b. if you don't have the wav.scp file, you can prepare it as follows
find path/to/train_set/ -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/train/wav.scp
find path/to/dev_set/   -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/dev/wav.scp
find path/to/test_set/  -iname " *.wav " | awk -F ' / ' ' {print $(NF),$0} ' | sort > dump/foo/test/wav.scp

# 2. collate shape files
mkdir exp/foo_states/train exp/foo_states/dev
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/train/wav.scp --out_dir exp/foo_states/train/wav_length
cat exp/foo_states/train/wav_length/wav_length. * .txt | shuf > exp/foo_states/train/speech_shape
torchrun --nproc_per_node=4 --master_port=1234 scripts/gen_wav_length.py --wav_scp dump/foo/dev/wav.scp --out_dir exp/foo_states/dev/wav_length
cat exp/foo_states/dev/wav_length/wav_length. * .txt | shuf > exp/foo_states/dev/speech_shape

# 3. train the model with 2 GPUs (device 4 and 5) on the customized dataset (foo)
bash run.sh --gpu_devices 4,5 --gpu_num 2 --dumpdir dump/foo --state_dir foo_states

인정하다

우리는 Dataloader, Model Definition 등을 포함하여 Funasr의 일관된 디자인을 가지고있었습니다.
우리는 데이터 준비를 위해 Kaldi에서 많은 코드를 빌 렸습니다.
우리는 ESPNET에서 많은 코드를 빌 렸습니다. Funcodec은 ESPNET의 교육 및 최종 파이프 라인을 후속합니다.
우리는 Enocdec 및 enocdec_trainner에서 모델 아키텍처 설계를 빌 렸습니다.

특허

이 프로젝트는 MIT 라이센스에 따라 라이센스가 부여됩니다. FUNCODEC에는 다양한 타사 구성 요소와 다른 오픈 소스 라이센스의 다른 저장소에서 수정 된 일부 코드도 포함되어 있습니다.

인용

 @misc { du2023funcodec ,
      title = { FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec } ,
      author = { Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng } ,
      year = { 2023 } ,
      eprint = { 2309.07405 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.Sound }
}