shanghainese tts
2023.06.06
To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.
See writeup/main.pdf.
pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt # for analysis of questionnaire resultsSee speech_synthesis/README.md.
phonemisation/: contains the phonemisation module
phonemisation/__init__.pypython -m phonemisation "text to phonemise"jieba is used for word segmentationQieyun module to add the tone number 1 to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarkedromanisation_to_ipa function in romanisation.py contains the phonemisation functionmake_metadata.py: uses the phonemisation module to convert transcription into IPA and generate metadata for training
data/data/: contains the dataset used for training
shh.dict.cn/ is used for training*/metadata.txt files are generated by make_metadata.pytraining/
coqui-ai/TTS repo, which contains an implementation of VITSwriteup/: the write-upspeech_synthesis/: contains the speech synthesis model
speech_synthesis/README.md for more detailscomparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker
*-1.wav: produced by this model*-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)*-3.wav: spoken by myselfstats.ipynb: Jupyter notebook for analysing the questionnaire results