neologdn Download - neologdn Source code download

neologdn

Other source code

v0.5.2

Download

neologdn

neologdn is a Japanese text normalizer for mecab-neologd.

The normalization is based on the neologd's rules: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Contributions are welcome!

NOTE: Installing this module requires C++11 compiler.

Installation

 $ pip install neologdn

Usage

 import neologdn
neologdn . normalize ( "ﾊﾝｶｸｶﾅ" )
# => 'ハンカクカナ'
neologdn . normalize ( "全角記号！？＠＃" )
# => '全角記号!?@#'
neologdn . normalize ( "全角記号例外「・」" )
# => '全角記号例外「・」'
neologdn . normalize ( "長音短縮ウェーーーーイ" )
# => '長音短縮ウェーイ'
neologdn . normalize ( "チルダ削除ウェ~∼∾〜〰～イ" )
# => 'チルダ削除ウェイ'
neologdn . normalize ( "いろんなハイフン˗֊‐‑‒–⁃⁻₋−" )
# => 'いろんなハイフン-'
neologdn . normalize ( "　　　ＰＲＭＬ　　副　読　本　　　" )
# => 'PRML副読本'
neologdn . normalize ( " Natural Language Processing " )
# => 'Natural Language Processing'
neologdn . normalize ( "かわいいいいいいいいい" , repeat = 6 )
# => 'かわいいいいいい'
neologdn . normalize ( "無駄無駄無駄無駄ァ" , repeat = 1 )
# => '無駄ァ'
neologdn . normalize ( "1995〜2001年" , tilde = "normalize" )
# => '1995~2001年'
neologdn . normalize ( "1995~2001年" , tilde = "normalize_zenkaku" )
# => '1995〜2001年'
neologdn . normalize ( "1995〜2001年" , tilde = "ignore" )  # Don't convert tilde
# => '1995〜2001年'
neologdn . normalize ( "1995〜2001年" , tilde = "remove" )
# => '19952001年'
neologdn . normalize ( "1995〜2001年" )  # Default parameter
# => '19952001年'

Benchmark

 # Sample code from
# https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#python-written-by-hideaki-t--overlast
import normalize_neologd

% timeit normalize ( normalize_neologd . normalize_neologd )
# => 9.55 s ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


import neologdn
% timeit normalize ( neologdn . normalize )
# => 6.66 s ± 35.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

neologdn is about x1.43 faster than sample code.

Details are described as the below notebook: https://github.com/ikegami-yukino/neologdn/blob/master/benchmark/benchmark.ipynb

License

Apache Software License.

Contribution

Contributions are welcome! See: https://github.com/ikegami-yukino/neologdn/blob/master/.github/CONTRIBUTING.md

Cited by

Book

Yamamoto Kazuhide. Elemental Techniques of Text Processing. Modern Scientists. P.41. 2021.

Blog

[Library introduction] Text normalization library neologdn: https://diatonic.codes/blog/neologdn/
Preprocessing Japanese text: neologdn, uppercase, lowercase, Unicode normalization - tuttieee's blog: https://tuttieee.hatenablog.com/entry/ja-nlp-preprocess
▲Today's Function==neologdn.normalize()== - TPT Blog: https://ds-blog.tbtech.co.jp/entry/2020/05/11/%E2%96%B2%E6%9C%AC%E6%97%A5%E3%81%AE%E9%96%A2%E6%95%B0%3D%3Dneologdn_normalize%28%29%3D%3D
Learn about NLP: https://zenn.dev/panyoriokome/scraps/d67f68ab50c0c1
Calling Python library for text normalization from MATLAB #Python - Qiita: https://qiita.com/aoimidori/items/ab5a4383b5a7bb307bad
Introducing the preprocessing procedure for natural language processing with Python code | Introducing AI utilization and AI implementation cases | Introducing AI utilization and AI implementation cases: https://www.matrixflow.net/case-study/75/
Japanese preprocessing memo using python | DATUM STUDIO Co., Ltd.: https://datumstudio.jp/blog/python%E3%81%AB%E3%82%88%E3%82%8B%E6%97%A5%E6%9C%AC%E8%AA%9E%E5%89%8D%E5%87%A6%E7%90%86%E5%82%99%E5%BF%98%E9%8C%B2/
Pretreatment, pretreatment, and pretreatment (Natural Language Processing: Japanese Edition) | narudesu: https://note.com/narudesu/n/na35de30a583a
Neologd.normalize with shortcut key: https://scrapbox.io/nishio/%E3%82%B7%E3%83%A7%E3%83%BC%E3%83%88%E3%82%AB%E3%83%83%E3%83%88%E3%82%AD%E3%83%BC%E3%81%A7neologd.normalize
Building an environment for natural language processing using Python #Python - Qiita: https://qiita.com/lawyer_alpaca/items/86b0deda984170203467
Python normalize Examples: https://python.hotexamples.com/examples/neologdn/-/normalize/python-normalize-function-examples.html
Shishimaro Co., Ltd. (ch-4) Analysis of chABSA datasets using potential Dirichlet allocation (LDA): https://shishimaro.co.jp/blog/ai/538
Preprocessing Japanese documents before morpheme analysis (Python) - Ke Diary: https://ohke.hateblo.jp/entry/2019/02/09/141500
Make artificial intelligence understand language! ? A thorough explanation of preprocessing of data important for natural language processing using Python | AI Research Institute: https://ai-kenkyujo.com/programming/make-ai-understand-the-language/
Create a MeCab user dictionary that reflects the latest wikipedia - NEologd extension | Plakome: https://purakome.net/mecab/addwiki/
[Introduction to Natural Language Processing] Processing sentences using stop words and normalization | Mynavi Engineer Blog: https://engineerblog.mynavi.jp/technology/nlp_stopword/
Unified notation [Natural language processing rice cake shop]: https://www.jnlp.org/nlp/%E6%A0%A1%E6%AD%A3/%E8%A1%A8%E8%A8%98%E7%B5%B1%E4%B8%80
Building T5 text generation model using Pytorch - Easy practice with transfer learning in Transformers - Apprentice Data Scientist's hideaway: https://www.dskomei.com/entry/2021/09/28/110016
Walking with the Elephant: Easy text mining with Goolge Colab (Japanese pre-processing): https://walking-elephant.blogspot.com/2023/07/text-mining-normalized.html
[Let's implement Natural Language Processing (NLP) in Python! ]A thorough explanation of the knowledge you need to learn! - The forefront of Vietnam offshore development by Mattock inc.: https://mattock.jp/blog/artificial-intelligence/nlp/lets-implement-nlp-in-python/
tools [Digital Humanities Japan: Resource Wiki]: https://dhjapan.org/wiki/doku.php?id=tools
I looked up modern seasonal words in Python | Aidemy | Aidemy AI programming learning service starting in 10 seconds [Idemy]: https://aidemy.net/magazine/703/

Expand

Additional Information

Version v0.5.2
Type Other source code
Update Time 2025-04-17
size 99.84KB
From Github

Related Applications

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All