mongolian nlp
1.0.0
This repo will contain a list of useful resources for Mongolian NLP. Feel free to contribute.
DATASET ~8 hours Mongolian TTS dataset:MnTTS created from the Inner Mongolia University, China
DATASET LJSpeech like male voice TTS dataset created from the Mongolian Bible
DATASET LJSpeech like Kalmyk (West Mongolian) female voice TTS dataset created from the Kalmyk Bible (2 hours)DATASET 300 hours Kalmyk synthetic STT dataset created by a voice conversion model
DATASET Eduge news classification dataset provided by Bolorsoft LLC
урлаг соёл, эдийн засаг, эрүүл мэнд, хууль, улс төр,
спорт, технологи, боловсрол and байгал орчинDATASET 11-11.mn government agency complaint dataset
санал хүсэлт, гомдол, шүүмжлэл, талархал and өргөдөлDATASET online news corpus
DATASET Digital Archive of Mongolian Newspapers 1990-1995 of the British LibraryDATASET 220K Mongolian personal namesDATASET 90K Mongolian clan/family namesDATASET 192K Mongolian company namesDATASET Mongolian provinces (aimags and sums) namesDATASET 195 country (with capital cities) names in MongolianDATASET 250 Mongolian most frequent words from Mongolian news, books and Wikipedia articles. (total 670M words / 2M unique words).
DATASET 500 Mongolian abbreviationsDATASET Mongolian NER dataset created from Mongolian politics and sport news
LOCATION (6453/1753), PERSON (2839/1698), ORGANIZATION (4453/1970) and MISC (3716/2617)DATASET Mongolian POS dataset of the National University of Mongolia
DATASET Traditional Mongolian synthetic OCR dataset created from Mongolian song lyrics and dictionary
DATASET Traditional Mongolian OCR dataset
DATASET Handwritten Mongolian Cyrillic Characters Database of the Mongolian University of Science and Technology
DATASET Mongolian Wordnet of the National University of Mongolia
DATASET Mongolian Inflectional Morphology from UniMorph 4.0
DATASET Mongolian Derivational Morphology from MorphyNet
DATASET Multilingual Spoken Words multilingual keyword spotting dataset
аав, байна, бэлдэж, дүрслэх, ламын, олов, сонирхож, түүний, хаанаас, хуулиар, чиглэсэнDATASET Small Kalmyk text corpus
PYTORCH tugstugi/pytorch-dc-tts
DEMO Colab online demoDATASET LJSpeech like male voice dataset created from the Mongolian BibleTF tugstugi/Tacotron-2 fork of Rayhane-mamah/Tacotron-2 adapted for
the Mongolian Bible dataset
DEMO Colab online demoDEMO speaker adaptation Colab online demo for the former Mongolian president Elbegdorj. The Tacotron model trained with the 5 hours Mongolian Bible dataset was fine tuned with a 10 minutes dataset created from a Elbegdorj's speech.PYTORCH Chimege TTS demo
DEMO HMM TTS online demo of the National University of Mongolia
DEMO SAMPLES Tacotron2 TTS demo samples of Ikon.MN
DEMO HMM based TTS online demo of the Inner Mongolian university
DEMO MTL-Tacotron TTS demo samples of the Inner Mongolian university & National University of Singapore
TF ttslr/MonTTS Inner Mongolian TTS training code
SAMPLES Speech samplesDATASET SAMPLES MonSpeech of the Inner Mongolia UniversityTF walker-hyf/MnTTS Inner Mongolian TTS dataset and training code
SAMPLES Speech samplesDATASET MnTTS of the Inner Mongolia UniversityPretrained Model download linkPRODUCT NVDA/HTS screen reader developed by Innovation Development Center for the blind
PYTORCH/DEMO Kalmyk TTS demo Kalmyk is a Mongolic language spoken in Russia
PYTORCH/DEMO Kalmyk TTS demo from Silero Kalmyk is a Mongolic language spoken in Russia
MODEL 5-gram binary LM generated by KenLM on a 670M word dirty corpus.
./generate_trie alphabet.txt mn_5gram.binary trieTF / PYTORCH tugstugi/mongolian-bert pretrained Mongolian BERT models
PYTORCH bayartsogt-ya/albert-mongolian pretrained Mongolian ALBERTPYTORCH robertritz/NLP ULMFiT experimentsPYTORCH huggingface.co/bayartsogt/mongolian-gpt2 Mongolian GPT-2 modelPYTORCH huggingface.co/bayartsogt/mongolian-roberta-base Mongolian Roberta base modelPYTORCH tugstugi/mongolian-speech-recognition
DEMO Chimege Speech RecognitionPRODUCT Chinese and traditional Mongolian voice input from aicloud.com
DEMO PRODUCT Huawei cloud ASR supports minority languages such as Mongolian, Tibetan, and Uyghur.PRODUCT Google Cloud Speech-to-text
PYTORCH Wav2Vec2 XLSR finetuned on Mongolian Common Voice
DEMO Colab online demoPYTORCH Wav2Vec2 XLSR trained on Kalmyk dataset
DEMO https://huggingface.co/tugstugi/wav2vec2-large-xlsr-53-kalmykTF coqui.ai mongolian speech recognition trained on Mongolian CommonVoice
DEMO Cyrillic to Mongolian script converter demo of the Inner Mongolian universityDEMO Mongolian script OCR demo of the Inner Mongolian universityPYTORCH tugstugi/bichig2cyrillic Mongolian script to (and back) cyrillic converter
DEMO Cyrillic to Mongolian Colab online demoPYTORCH tugstugi/image2bichig Traditional Mongolian OCR using CRNN
DEMO OCR Colab online demoDATASET Traditional Mongolian synthetic OCR datasetTF2 sharavsambuu/mongolian-text-classificationSKLEARN / DEMO simple SVM Colab notebook classifying the Eduge dataset with around 91% accuracy.
DATASET Mongolian NER dataset created from Mongolian politics and sport news
PYTORCH enod/mongolian-bert-ner BERT based Mongolian NER
DEMO NER demo of the National University of MongoliaPYTORCH tugstugi/forced_aligner Mongolian forced alignment tool using Rayhane-mamah/Tacotron-2
and readbeyond/aeneas
DEMO Colab online demoTF2 cyrillic transliteration Colab notebook sharavsambuu/cyrillic-mongolian-transliterationDATASET 1M back-translated MN->EN sentence dataset download link
DICTIONARY Mongolian digitalized dictionaries from Center for Northeast Asian of the Tohoku University in Japan