TTS-TextAnalyzer
受Introducing Unified Neural Text Analyzer: an innovation for Neural Text-to-Speech pronunciation accuracy improvement 啟發,可在BERT 模型基礎上構建多個任務的heads 來統一語音合成文本分析的任務,包括:分詞,詞性預測、文本歸一化、多音詞消歧等。這個項目用來收集適用於各任務的數據集信息。
Inspired by Introducing Unified Neural Text Analyzer: an innovation for Neural Text-to-Speech pronunciation accuracy improvement, Different tasks of speech synthesis text analysis can be built on the BERT model, including: Word Segmentation, Part-of-Speech Tagging, Text Normalization, Polyphone Disambiguation and etc. This project is used to collect dataset information suitable for each task.
Pretrained BERT
- bert-base-chinese
- bert-base-multilingual-cased
- xlm-roberta-base
Word Segmentation
Part-of-Speech Tagging
Text Normalization
| datasets / rules | code |
|---|
| rules | WeTextProcessing |
| Text normalization covering grammars | TextNormalizationCoveringGrammars |
| TODO | |
Polyphone Disambiguation
| datasets | code |
|---|
| g2PL | https://github.com/whzikaros/g2pL |
| CPP (g2pM) | https://github.com/kakaobrain/g2pm |
| TODO | |