Matcha TTS下载 - Matcha TTS源代码下载

Matcha TTS

Ai源码

v0.0.7

下载

？ Matcha-TTS：具有条件流匹配的快速TTS体系结构

Shivam Mehta，Ruibo Tu，Jonas Beskow，EvaSzékely和Gustav Eje Henter

这是官方代码实施？ Matcha-TTS [ICASSP 2024]。

我们建议？ Matcha-TTS是一种非自动回忆神经TTS的新方法，它使用条件流量匹配（类似于整流流）来加快基于ODE的语音综合。我们的方法：

是概率
具有紧凑的内存足迹
听起来很自然
很快就可以合成

查看我们的演示页面，并阅读我们的ICASSP 2024纸，以获取更多详细信息。

预训练的模型将自动下载使用CLI或Gradio接口。

你也可以尝试吗？在浏览器中，抹茶在拥抱面上？空间。

预告片视频

安装

创建一个环境（建议但可选）

 conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts

使用PIP或从源安装Matcha TTS

pip install matcha-tts

来自来源

pip install git+https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS
pip install -e .

运行CLI / GRADIO应用 / Jupyter笔记本

 # This will download the required models
matcha-tts --text " <INPUT TEXT> "

或者

matcha-tts-app

或打开synthesis.ipynb在jupyter笔记本上

CLI论点

要从给定文本合成，请运行：

matcha-tts --text " <INPUT TEXT> "

要从文件合成，请运行：

matcha-tts --file < PATH TO FILE >

要从文件中批处理合成，请运行：

matcha-tts --file < PATH TO FILE > --batched

其他参数

说话率

matcha-tts --text " <INPUT TEXT> " --speaking_rate 1.0

采样温度

matcha-tts --text " <INPUT TEXT> " --temperature 0.667

Euler Ode求解器步骤

matcha-tts --text " <INPUT TEXT> " --steps 10

用自己的数据集训练

假设我们正在接受LJ演讲的培训

从这里下载数据集，将其提取到data/LJSpeech-1.1 ，然后准备文件列表以指向提取的数据，例如NVIDIA TACOTRON 2 REPO的设置中的项目5。
克隆并输入Matcha-TTS存储库

git clone https://github.com/shivammehta25/Matcha-TTS.git
cd Matcha-TTS

从源安装包裹

pip install -e .

转到configs/data/ljspeech.yaml并更改

 train_filelist_path : data/filelists/ljs_audio_text_train_filelist.txt
valid_filelist_path : data/filelists/ljs_audio_text_val_filelist.txt

使用数据集配置的YAML文件生成归一化统计信息

matcha-data-stats -i ljspeech.yaml
# Output:
#{ ' mel_mean ' : -5.53662231756592, ' mel_std ' : 2.1161014277038574}

在data_statistics密钥下，在configs/data/ljspeech.yaml中更新这些值。

data_statistics:  # Computed for ljspeech dataset
  mel_mean: -5.536622
  mel_std: 2.116101

到您的火车和验证材料的道路。

运行训练脚本

make train-ljspeech

或者

python matcha/train.py experiment=ljspeech

最小内存运行

python matcha/train.py experiment=ljspeech_min_memory

进行多GPU培训，运行

python matcha/train.py experiment=ljspeech trainer.devices=[0,1]

从定制训练的模型中合成

matcha-tts --text " <INPUT TEXT> " --checkpoint_path < PATH TO CHECKPOINT >

ONNX支持

特别感谢 @Mush42实现ONNX导出和推理支持。

可以将Matcha检查点导出到ONNX，并在导出的ONNX图上运行推断。

ONNX导出

要将检查站导出到ONNX，请首先安装onnx

pip install onnx

然后运行以下内容：

python3 -m matcha.onnx.export matcha.ckpt model.onnx --n-timesteps 5

可选地，ONNX出口商接受Vocoder-Name和Vocoder-Checkpoint参数。这使您可以将Vocoder嵌入导出的图表中，并在单个运行中生成波形（类似于端到端TTS系统）。

请注意， n_timesteps被视为高参数，而不是模型输入。这意味着您应该在导出期间（而不是在推理期间）指定它。如果未指定，则将n_timesteps设置为5 。

重要：目前，导出需要火炬> = 2.1.0，因为scaled_product_attention oterator在较旧版本中不可导出。在发布最终版本之前，那些想要导出模型的人必须安装TORCH> = 2.1.0作为预释放。

onnx推断

要在导出的模型上进行推断， onnxruntime首先使用

pip install onnxruntime
pip install onnxruntime-gpu  # for GPU inference

然后使用以下内容：

python3 -m matcha.onnx.infer model.onnx --text " hey " --output-dir ./outputs

您还可以控制合成参数：

python3 -m matcha.onnx.infer model.onnx --text " hey " --output-dir ./outputs --temperature 0.4 --speaking_rate 0.9 --spk 0

要在GPU上运行推断，请确保安装OnnxRuntime-GPU软件包，然后将--gpu传递到推理命令：

python3 -m matcha.onnx.infer model.onnx --text " hey " --output-dir ./outputs --gpu

如果您仅导出抹茶到ONNX，则将MEL-SPECTROGRAM和numpy数组写入输出目录。如果将VOCODER嵌入导出图中，则将.wav音频文件写入输出目录。

如果您仅导出抹茶到ONNX，并且要运行完整的TTS管道，则可以以ONNX格式通往Vocoder模型的路径：

python3 -m matcha.onnx.infer model.onnx --text " hey " --output-dir ./outputs --vocoder hifigan.small.onnx

这将将.wav音频文件写入输出目录。

从抹茶中提取音素对齐

如果数据集的结构为

data/
└── LJSpeech-1.1
    ├── metadata.csv
    ├── README
    ├── test.txt
    ├── train.txt
    ├── val.txt
    └── wavs

然后，您可以使用：

python  matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c < checkpoint >

例子：

python  matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt

或简单：

matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt

使用提取对齐的火车

在DataSetConfig中，打开加载持续时间。示例： ljspeech.yaml

 load_durations: True

或查看configs/lassiment/ljspeech_from_durations.yaml中的示例

引文信息

如果您使用我们的代码或以其他方式觉得这项工作有用，请引用我们的论文：

 @inproceedings{mehta2024matcha,
  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{'e}kely, {'E}va and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2024}
}