ln -s /path/to/LJSpeech-1.1/wavs DUMMYPlease refer to XPhoneBERT for more information. They using text2phonemesequence for converting raw text to phoneme sequence.
Initializing text2phonemesequence for each language requires its corresponding ISO 639-3 code. The ISO 639-3 codes of supported languages are available at HERE.
text2phonemesequence takes a word-segmented sequence as input. And users might also perform text normalization on the word-segmented sequence before feeding into text2phonemesequence.
Note: For languages such as Chinese, Korean, Japanese (CJK languages) and some southeast Asian languages, words are not separated by spaces. An external tokenizers must be used before feeding words into this model.
In this case, write a script to normalize and segment your input before feeding to text2phonemesequence (vie_preprocess.py is in my case)
# In Case languages, words are not separated by spaces such as Vietnamese.
python vie_preprocess.py --out_extension cleaned --filelists filelists/train.txt filelists/val.txt
python preprocess.py --input_file filelists/train.txt.cleaned --output_file filelists/train.list --language vie-n --batch_size 64 --cuda
python preprocess.py --input_file filelists/val.txt.cleaned --output_file filelists/val.list --language vie-n --batch_size 64 --cuda
# In Case languages English.
python preprocess.py --input_file filelists/train.txt.cleaned --output_file filelists/train.list --language eng-us --batch_size 64 --cuda
python preprocess.py --input_file filelists/val.txt.cleaned --output_file filelists/val.list --language eng-us --batch_size 64 --cuda# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplaceMore info about config refer to configs/config.json
# LJ Speech
python train.py -c configs/config.json -m ljs_base