g2pW
v0.0.7
Autoren: Yi-Chang Chen, Yu-Chuan Chang, Yen-Chen Chang und Yi-Ren Yeh
Dies ist das offizielle Repository unseres Papiers G2PW: eine bedingte gewichtete Softmax -Bert für die Polyphon -Disambiguation in Mandarin ( Interspeech 2022 ).
(Diese Arbeit wurde mit Pytorch 1.7.0, CUDA 10.1, Python 3.6 und Ubuntu 16.04 getestet.)
Pytorch installieren
$ pip install g2pw
> >> from g2pw import G2PWConverter
> >> conv = G2PWConverter ()
> >> sentence = '上校請技術人員校正FN儀器'
> >> conv ( sentence )
[[ 'ㄕㄤ4' , 'ㄒㄧㄠ4' , 'ㄑㄧㄥ3' , 'ㄐㄧ4' , 'ㄕㄨ4' , 'ㄖㄣ2' , 'ㄩㄢ2' , 'ㄐㄧㄠ4' , 'ㄓㄥ4' , None , None , 'ㄧ2' , 'ㄑㄧ4' ]]
> >> sentences = [ '銀行' , '行動' ]
> >> conv ( sentences )
[[ 'ㄧㄣ2' , 'ㄏㄤ2' ], [ 'ㄒㄧㄥ2' , 'ㄉㄨㄥ4' ]] conv = G2PWConverter ( model_dir = './G2PWModel-v2-onnx/' , model_source = './path-to/bert-base-chinese/' ) > >> from g2pw import G2PWConverter
> >> conv = G2PWConverter ( style = 'pinyin' , enable_non_tradional_chinese = True )
> >> conv ( '然而,他红了20年以后,他竟退出了大家的视线。' )
[[ 'ran2' , 'er2' , None , 'ta1' , 'hong2' , 'le5' , None , None , 'nian2' , 'yi3' , 'hou4' , None , 'ta1' , 'jing4' , 'tui4' , 'chu1' , 'le5' , 'da4' , 'jia1' , 'de5' , 'shi4' , 'xian4' , None ]] $ git clone https://github.com/GitYCC/g2pW.git
Zum Beispiel trainieren wir Modelle im CPP -Datensatz wie folgt:
$ bash cpp_dataset/download.sh
$ python scripts/train_g2p_bert.py --config configs/config_cpp.py
$ python scripts/test_g2p_bert.py
--config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py
--checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth
--sent_path cpp_dataset/test.sent
--output_path output_pred.txt
$ python scripts/predict_g2p_bert.py
--config saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/config.py
--checkpoint saved_models/CPP_BERT_M_DescWS-Sec-cLin-B_POSw01/best_accuracy.pth
--sent_path cpp_dataset/test.sent
--lb_path cpp_dataset/test.lb
Um den Code/Daten/Papier zu zitieren, verwenden Sie bitte dieses Bibtex
@inproceedings { chen22d_interspeech ,
title = { g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin } ,
author = { Yi-Chang Chen and Yu-Chuan Steven and Yen-Cheng Chang and Yi-Ren Yeh } ,
year = { 2022 } ,
booktitle = { Interspeech 2022 } ,
pages = { 1926--1930 } ,
doi = { 10.21437/Interspeech.2022-216 } ,
issn = { 2958-1796 } ,
}