SoCodecダウンロードSoCodecソースコードのダウンロード

SoCodec

AI ソースコード

1.0.0

ダウンロード

SOCODEC：効率的な言語モデルベースのテキストからスピーチへの合成のためのセマンティック順序マルチストリームスピーチコーデック

Haohan Guo 、Fenglong Xie、Kun Xie、Dongchao Yang、Dake Guo、Xixin Wu、Helen Meng

このリポジトリには、SOCODEC：SOCODEC：a Semantic ordered Multi-Stream Speech Codecというタイトルの論文で紹介されたスピーチ言語モデルに特化した超低bit系音声コーデックであるSoCodecの推論スクリプトが含まれています。

紙
？デモサイト
modelモデルの重み

SOCODECを使用すると、超低0.47 kbpsビットレートと短い120msフレームシフトでオーディオを離散コードに圧縮できます。
？ Encodecまたは音声言語モデリングアプリケーションのその他のマルチストリームコーデックのドロップイン交換として使用できます。
リリースされたチェックポイントは、現在中国語のみをサポートしています。多言語版のトレーニングが進行中です。

ニュース

2024年9月（v1.0）：
- SOCODECのチェックポイントと推論コードをリリースしました

インストール

リポジトリをクローンし、依存関係をインストールします。

git clone https://github.com/hhguo/SoCodec
cd SoCodec
mkdir ckpts && cd ckpts
wget https://huggingface.co/TencentGameMate/chinese-hubert-large/resolve/main/chinese-hubert-large-fairseq-ckpt.pt
wget https://huggingface.co/hhguo/SoCodec/resolve/main/socodec_16384x4_120ms_16khz_chinese.safetensors
wget https://huggingface.co/hhguo/SoCodec/resolve/main/mel_vocoder_80dim_10ms_16khz.safetensors

使用法

 # For analysis-synthesis
python example.py -i ground_truth.wav -o synthesis.wav
# For speech analysis
python example.py -i ground_truth.wav -o features.pt
# For token-to-audio synthesis
python example.py -i features.pt -o synthesis.wav