? Tensorflowtts

TensorFlow 2的实时最新语音综合2
? TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded系统。
该存储库在Ubuntu 18.04上进行了测试:
不同的TensorFlow版本应起作用,但尚未测试。此存储库将尝试使用最新的稳定张量流。我们建议您在要使用MultigPU的情况下安装TensorFlow 2.6.0进行培训。
$ pip install TensorFlowTTS示例包含在存储库中,但没有使用该框架。因此,要运行最新版本的示例,您需要在下面安装源。
$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
$ cd TensorFlowTTS
$ pip install .如果您想升级存储库及其依赖项:
$ git pull
$ pip install --upgrade .TensorFlowTTS当前提供以下架构:
我们还正在实施一些技术,以提高以下论文的质量和收敛速度:
在有效集中的音频样本中。 Tacotron-2,FastSpeech,Melgan,Melgan.stft,fastspeech2,Multiband_melgan
以以下格式准备数据集:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wavs/
| |- file1.wav
| |- ...
其中metadata.csv具有以下格式: id|transcription 。这是一种类似于ljspeech的格式;如果您有其他格式数据集,则可以忽略预处理步骤。
请注意, NAME_DATASET应该是[ljspeech/kss/baker/libritts/synpaflex] 。
预处理有两个步骤:
复制上述步骤:
tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
目前,我们仅支持ljspeech , kss , baker , libritts , thorsten和synpaflex用于数据集参数。将来,我们打算支持更多数据集。
注意:要运行libritts预处理,请先在示例/fastspeech2_libritts中首先阅读指令。在进行预处理之前,我们需要先对其进行重新格式化。
注意:要运行synpaflex预处理,请首先运行笔记本电脑/prepar_synpaflex.ipynb。在进行预处理之前,我们需要先对其进行重新格式化。
预处理后,项目文件夹的结构应为:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wav/
| |- file1.wav
| |- ...
|- dump_[ljspeech/kss/baker/libritts/thorsten]/
| |- train/
| |- ids/
| |- LJ001-0001-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0001-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0001-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0001-wave.npy
| |- ...
| |- valid/
| |- ids/
| |- LJ001-0009-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0009-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0009-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0009-wave.npy
| |- ...
| |- stats.npy
| |- stats_f0.npy
| |- stats_energy.npy
| |- train_utt_ids.npy
| |- valid_utt_ids.npy
|- examples/
| |- melgan/
| |- fastspeech/
| |- tacotron2/
| ...
stats.npy包含训练分谱图中的平均值和stdstats_energy.npy包含训练分配中能量值的平均值和性病stats_f0.npy包含训练拆分中F0值的平均值和stdtrain_utt_ids.npy / valid_utt_ids.npy分别包含培训和验证说法ID我们为每种输入类型使用后缀( ids , raw-feats , raw-energy , raw-f0 , norm-feats和wave )。
重要说明:
dump文件夹的最终结构都应遵循上述结构,以便能够使用培训脚本,或者您可以自己修改? 要了解如何从头开始训练模型或与其他数据集/语言进行微调,请在示例目录中查看详细信息。
来自Tensorflow_tts/DataSet/Abstract_dataset的抽象数据集类的详细实现。您需要有些功能过度并理解:
重要说明:
使用此Abstract_dataset的一些示例是tacotron_dataset.py,fastspeech_dataset.py,melgan_dataset.py,fastspech2_dataset.pypy
tensorflow_tts/trainer/base_trainer.py的base_trainer的详细实现。它包括Seq2SeqbasedTrainer和GanbasedTrainer从基于培训器继承。所有培训师都支持单个/多GPU。实现new_trainer时,您必须过度一些功能:
此存储库上的所有模型均经过培训的ganbasedTrainer (请参阅train_melgan.py,train_melgan_stft.py,triar_multiband_melgan.py)和seq2seqbassedtrainer (请参阅Train_tacotron2.py,Train_tacotron2.py,Train_fast_fastspeech.py)。
您可以知道如何在笔记本上推断每个模型,或者查看Colab(用于英语),Colab(用于韩语),Colab(用于中文),Colab(法语),Colab(用于德语)。这是用于使用FastSpeech2和Multi Band Melgan的End2End推断的示例代码。我们在拥抱面枢纽上上传了所有审慎的介绍。
import numpy as np
import soundfile as sf
import yaml
import tensorflow as tf
from tensorflow_tts . inference import TFAutoModel
from tensorflow_tts . inference import AutoProcessor
# initialize fastspeech2 model.
fastspeech2 = TFAutoModel . from_pretrained ( "tensorspeech/tts-fastspeech2-ljspeech-en" )
# initialize mb_melgan model
mb_melgan = TFAutoModel . from_pretrained ( "tensorspeech/tts-mb_melgan-ljspeech-en" )
# inference
processor = AutoProcessor . from_pretrained ( "tensorspeech/tts-fastspeech2-ljspeech-en" )
input_ids = processor . text_to_sequence ( "Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning." )
# fastspeech inference
mel_before , mel_after , duration_outputs , _ , _ = fastspeech2 . inference (
input_ids = tf . expand_dims ( tf . convert_to_tensor ( input_ids , dtype = tf . int32 ), 0 ),
speaker_ids = tf . convert_to_tensor ([ 0 ], dtype = tf . int32 ),
speed_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
f0_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
energy_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
)
# melgan inference
audio_before = mb_melgan . inference ( mel_before )[ 0 , :, 0 ]
audio_after = mb_melgan . inference ( mel_after )[ 0 , :, 0 ]
# save to file
sf . write ( './audio_before.wav' , audio_before , 22050 , "PCM_16" )
sf . write ( './audio_after.wav' , audio_after , 22050 , "PCM_16" )这里的所有模型均在Apache 2.0下许可
我们要感谢Tomoki Hayashi,他与我们讨论了很多有关梅尔根,多频段梅尔根,快速播放和tacotron的讨论。这个框架基于他的伟大开源并行波计划。