? Tensorflowtts

TensorFlow 2的實時最新語音綜合2
? TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded系統。
該存儲庫在Ubuntu 18.04上進行了測試:
不同的TensorFlow版本應起作用,但尚未測試。此存儲庫將嘗試使用最新的穩定張量流。我們建議您在要使用MultigPU的情況下安裝TensorFlow 2.6.0進行培訓。
$ pip install TensorFlowTTS示例包含在存儲庫中,但沒有使用該框架。因此,要運行最新版本的示例,您需要在下面安裝源。
$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
$ cd TensorFlowTTS
$ pip install .如果您想升級存儲庫及其依賴項:
$ git pull
$ pip install --upgrade .TensorFlowTTS當前提供以下架構:
我們還正在實施一些技術,以提高以下論文的質量和收斂速度:
在有效集中的音頻樣本中。 Tacotron-2,FastSpeech,Melgan,Melgan.stft,fastspeech2,Multiband_melgan
以以下格式準備數據集:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wavs/
| |- file1.wav
| |- ...
其中metadata.csv具有以下格式: id|transcription 。這是一種類似於ljspeech的格式;如果您有其他格式數據集,則可以忽略預處理步驟。
請注意, NAME_DATASET應該是[ljspeech/kss/baker/libritts/synpaflex] 。
預處理有兩個步驟:
複製上述步驟:
tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
目前,我們僅支持ljspeech , kss , baker , libritts , thorsten和synpaflex用於數據集參數。將來,我們打算支持更多數據集。
注意:要運行libritts預處理,請先在示例/fastspeech2_libritts中首先閱讀指令。在進行預處理之前,我們需要先對其進行重新格式化。
注意:要運行synpaflex預處理,請首先運行筆記本電腦/prepar_synpaflex.ipynb。在進行預處理之前,我們需要先對其進行重新格式化。
預處理後,項目文件夾的結構應為:
|- [NAME_DATASET]/
| |- metadata.csv
| |- wav/
| |- file1.wav
| |- ...
|- dump_[ljspeech/kss/baker/libritts/thorsten]/
| |- train/
| |- ids/
| |- LJ001-0001-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0001-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0001-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0001-wave.npy
| |- ...
| |- valid/
| |- ids/
| |- LJ001-0009-ids.npy
| |- ...
| |- raw-feats/
| |- LJ001-0009-raw-feats.npy
| |- ...
| |- raw-f0/
| |- LJ001-0001-raw-f0.npy
| |- ...
| |- raw-energies/
| |- LJ001-0001-raw-energy.npy
| |- ...
| |- norm-feats/
| |- LJ001-0009-norm-feats.npy
| |- ...
| |- wavs/
| |- LJ001-0009-wave.npy
| |- ...
| |- stats.npy
| |- stats_f0.npy
| |- stats_energy.npy
| |- train_utt_ids.npy
| |- valid_utt_ids.npy
|- examples/
| |- melgan/
| |- fastspeech/
| |- tacotron2/
| ...
stats.npy包含訓練分譜圖中的平均值和stdstats_energy.npy包含訓練分配中能量值的平均值和性病stats_f0.npy包含訓練拆分中F0值的平均值和stdtrain_utt_ids.npy / valid_utt_ids.npy分別包含培訓和驗證說法ID我們為每種輸入類型使用後綴( ids , raw-feats , raw-energy , raw-f0 , norm-feats和wave )。
重要說明:
dump文件夾的最終結構都應遵循上述結構,以便能夠使用培訓腳本,或者您可以自己修改? 要了解如何從頭開始訓練模型或與其他數據集/語言進行微調,請在示例目錄中查看詳細信息。
來自Tensorflow_tts/DataSet/Abstract_dataset的抽像數據集類的詳細實現。您需要有些功能過度並理解:
重要說明:
使用此Abstract_dataset的一些示例是tacotron_dataset.py,fastspeech_dataset.py,melgan_dataset.py,fastspech2_dataset.pypy
tensorflow_tts/trainer/base_trainer.py的base_trainer的詳細實現。它包括Seq2SeqbasedTrainer和GanbasedTrainer從基於培訓器繼承。所有培訓師都支持單個/多GPU。實現new_trainer時,您必須過度一些功能:
此存儲庫上的所有模型均經過培訓的ganbasedTrainer (請參閱train_melgan.py,train_melgan_stft.py,triar_multiband_melgan.py)和seq2seqbassedtrainer (請參閱Train_tacotron2.py,Train_tacotron2.py,Train_fast_fastspeech.py)。
您可以知道如何在筆記本上推斷每個模型,或者查看Colab(用於英語),Colab(用於韓語),Colab(用於中文),Colab(法語),Colab(用於德語)。這是用於使用FastSpeech2和Multi Band Melgan的End2End推斷的示例代碼。我們在擁抱面樞紐上上傳了所有審慎的介紹。
import numpy as np
import soundfile as sf
import yaml
import tensorflow as tf
from tensorflow_tts . inference import TFAutoModel
from tensorflow_tts . inference import AutoProcessor
# initialize fastspeech2 model.
fastspeech2 = TFAutoModel . from_pretrained ( "tensorspeech/tts-fastspeech2-ljspeech-en" )
# initialize mb_melgan model
mb_melgan = TFAutoModel . from_pretrained ( "tensorspeech/tts-mb_melgan-ljspeech-en" )
# inference
processor = AutoProcessor . from_pretrained ( "tensorspeech/tts-fastspeech2-ljspeech-en" )
input_ids = processor . text_to_sequence ( "Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning." )
# fastspeech inference
mel_before , mel_after , duration_outputs , _ , _ = fastspeech2 . inference (
input_ids = tf . expand_dims ( tf . convert_to_tensor ( input_ids , dtype = tf . int32 ), 0 ),
speaker_ids = tf . convert_to_tensor ([ 0 ], dtype = tf . int32 ),
speed_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
f0_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
energy_ratios = tf . convert_to_tensor ([ 1.0 ], dtype = tf . float32 ),
)
# melgan inference
audio_before = mb_melgan . inference ( mel_before )[ 0 , :, 0 ]
audio_after = mb_melgan . inference ( mel_after )[ 0 , :, 0 ]
# save to file
sf . write ( './audio_before.wav' , audio_before , 22050 , "PCM_16" )
sf . write ( './audio_after.wav' , audio_after , 22050 , "PCM_16" )這裡的所有模型均在Apache 2.0下許可
我們要感謝Tomoki Hayashi,他與我們討論了很多有關梅爾根,多頻段梅爾根,快速播放和tacotron的討論。這個框架基於他的偉大開源並行波計劃。