CDFSE_FastSpeech2下載CDFSE_FastSpeech2源代碼下載

CDFSE_FastSpeech2

Ai源碼

1.0.0

下載

cdfse_fastspeech2

該倉庫包含伴隨論文“與內容相關的細粒揚聲器嵌入以零攝像器的揚聲器適應在文本到語音合成中”的代碼，該代碼基於Ming024/fastspeech2（非常感謝！）實現。

2022-06-15更新：這項工作已被接受為Interspeech 2022。

樣品|紙

用法

0。數據集

普通話：aishell3
英語：libritts

1。環境設置

pip3 install -r requirements.txt

2。數據預處理

有關更多詳細信息，請參考Ming024/fastspeech2。

例如，

第一次運行

python3 prepare_align.py config/AISHELL3/preprocess.yaml

然後下載textgrid文件或使用MFA對齊語料庫，然後將TextGrid文件放入[PrepRocessed_data_path]中，例如PrepRocessed_data/aishell3/textgrid/。
最後，運行預處理腳本

python3 preprocess.py config/AISHELL3/preprocess.yaml

此外：

我們已經在Preprocessed_data/[DataSet]/*中將火車，val和測試集拆分。因此，您可以將它們直接放入數據預處理後的[PrepRocessed_data_path]中，或者自己重新分組。
我們在preprocessed_data/[dataset]/*中提供了“ sakeerfile_dict.json”（在dataset.py中用於隨機加載參考語音），您可以使用generate_speakerfiledict.py生成它。
我們在hifigan/preftraining/ *中提供了一些hifigan預處理的參數，您可以加載它們（請記住要解開 *.zip文件）或在utils/model.py中使用自己訓練有素的vocoder。

3。培訓

訓練模型

python3 train.py -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml

指出：如果您發現PHNCLS損失似乎並沒有趨於趨勢或不明顯，請嘗試在文本/符號中手動調整符號命令。

（可選）使用張板

tensorboard --logdir output/log/AISHELL3

4。推理

批次

python3 synthesize.py --source synbatch_chinese.txt --restore_step 250000 --mode batch -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml

單一

 # For Mandarin
python3 synthesize.py --text "清华大学人机语音交互实验室，聚焦人工智能场景下的智能语音交互技术研究。 " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/AISHELL3/preprocess.yaml -m config/AISHELL3/model.yaml -t config/AISHELL3/train.yaml 
# For English
python3 synthesize.py --text " Human Computer Speech Interaction Lab at Tsinghua University, targets artificial intelligence technologies for smart voice user interface. " --ref [REF_SPEECH_PATH.wav] --restore_step 250000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

實施更新

（2022-06-20）在MEL含量編碼器中採用實例歸一化，以提高性能。
（2022-06-01）支持英語設置：Libritts多揚聲器數據集（Train-Clean-100 + Dev-Clean + Test-Clean）。
（2022-04-27）直接使用wavfile（*.wav）作為參考語音而不是單個模式下的numpy文件支持。

參考

ming024/fastspeech2
JIK876/HIFI-GAN

引用

 @misc{zhou2022content,
  title={Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis}, 
  author={Zhou, Yixuan and Song, Changhe and Li, Xiang and Zhang, Luwen and Wu, Zhiyong and Bian, Yanyao and Su, Dan and Meng, Helen},
  year={2022},
  eprint={2204.00990},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-08-21
大小 112.78MB
來自於 Github

相關應用

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
YuQue_Book_Download

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
nextcloud_share_url_downloader

2024-11-01
麗華資料分析引擎免費版3.0_搜尋_導航_採集_輿情_排行_api

2022-06-28

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部