SmoothNLP Download - SmoothNLP Source code download

SmoothNLP

Author	Email
Victor	[email protected]
Yinjun	[email protected]
jellyfish	[email protected]

SmoothNLP
- Install
- Knowledge Graph
  - Call Examples & Visualization
- NLP Basics Pipelines
  - 1. Tokenize participle
  - 2. Postag part-of-speech annotation
  - 3. NER entity recognition
  - 4. Financial entity identification
  - 5. Dependence syntax analysis
  - 6. Cut sentences
  - 7. Multithreaded support
  - 8. Log
- Unsupervised learning
  - New word mining
  - Event Clustering
- Supervised learning
  - (Information) Event Classification
- Tutorial
- Service Description
  - statement
  - Pro Professional Edition
  - Frequently Asked Questions
- Set fonts
- Easter eggs

Install

Install via pip

pip install smoothnlp > =0.4.0

Install the latest version through source code

git clone https://github.com/smoothnlp/SmoothNLP.git
cd SmoothNLP
python setup.py install

Knowledge Graph

Only supported versions of SmoothNLP V0.3.0 and later; the following are examples after version V0.4 :

Call Examples & Visualization

 from smoothnlp . algorithm import kg
from kgexplore import visual
ngrams = kg . extract_ngram ([ "SmoothNLP在V0.3版本中正式推出知识抽取功能" ,
                            "SmoothNLP专注于可解释的NLP技术" ,
                            "SmoothNLP支持Python与Java" ,
                            "SmoothNLP将帮助工业界与学术界更加高效的构建知识图谱" ,
                            "SmoothNLP是上海文磨网络科技公司的开源项目" ,
                            "SmoothNLP在V0.4版本中推出对图谱节点的分类功能" ,
                            "KGExplore是SmoothNLP的一个子项目" ])
visual . visualize ( ngrams , width = 12 , height = 10 )

SmoothNLP_KG_Demo

Function description

The edge-type supported in version V0.4 includes:事件触发,状态描述,属性描述, and数值描述.
The node types (node-types) supported in version V0.4 include:产品,地区,公司与品牌,货品,机构,人物,修饰短语, and其他.

NLP Basics Pipelines

1. Tokenize word segmentation

 >> import smoothnlp 
>> smoothnlp . segment ( '欢迎在Python中使用SmoothNLP' )
[ '欢迎' , '在' , 'Python' , '中' , '使用' , 'SmoothNLP' ]

2.Postag part-of-speech annotation

Part-of-speech label explanation wiki

 >> smoothnlp . postag ( '欢迎使用smoothnlp的Python接口' )
[{ 'token' : '欢迎' , 'postag' : 'VV' },
 { 'token' : '在' , 'postag' : 'P' },
 { 'token' : 'Python' , 'postag' : 'NN' },
 { 'token' : '中' , 'postag' : 'LC' },
 { 'token' : '使用' , 'postag' : 'VV' },
 { 'token' : 'SmoothNLP' , 'postag' : 'NN' }]

3.NER entity recognition

 >> smoothnlp . ner ( "中国平安2019年度长期服务计划于2019年5月7日至5月14日通过二级市场完成购股" )
[{ 'charStart' : 0 , 'charEnd' : 4 , 'text' : '中国平安' , 'nerTag' : 'COMPANY_NAME' , 'sTokenList' : { '1' : { 'token' : '中国平安' , 'postag' : None }}, 'normalizedEntityValue' : '中国平安' },
{ 'charStart' : 4 , 'charEnd' : 9 , 'text' : '2019年' , 'nerTag' : 'NUMBER' , 'sTokenList' : { '2' : { 'token' : '2019年' , 'postag' : 'CD' }}, 'normalizedEntityValue' : '2019年' },
{ 'charStart' : 17 , 'charEnd' : 26 , 'text' : '2019年5月7日' , 'nerTag' : 'DATETIME' , 'sTokenList' : { '8' : { 'token' : '2019年5月' , 'postag' : None }, '9' : { 'token' : '7日' , 'postag' : None }}, 'normalizedEntityValue' : '2019年5月7日' },
{ 'charStart' : 27 , 'charEnd' : 32 , 'text' : '5月14日' , 'nerTag' : 'DATETIME' , 'sTokenList' : { '11' : { 'token' : '5月' , 'postag' : None }, '12' : { 'token' : '14日' , 'postag' : None }}, 'normalizedEntityValue' : '5月14日' }]

4. Financial entity identification

 >> smoothnlp . company_recognize ( "旷视科技预计将在今年9月在港IPO" )
[{ 'charStart' : 0 ,
  'charEnd' : 4 ,
  'text' : '旷视科技' ,
  'nerTag' : 'COMPANY_NAME' ,
  'sTokenList' : { '1' : { 'token' : '旷视科技' , 'postag' : None }},
  'normalizedEntityValue' : '旷视科技' }]

5. Dependence syntax analysis

Note that Index=0 returned by smoothnlp.dep_parsing is root token of dummy.

Dependence syntax analysis tag explanation wiki

 smoothnlp . dep_parsing ( "特斯拉是全球最大的电动汽车制造商。" )
> [{ 'relationship' : 'top' , 'dependentIndex' : 2 , 'targetIndex' : 1 },
  { 'relationship' : 'root' , 'dependentIndex' : 0 , 'targetIndex' : 2 },
  { 'relationship' : 'dep' , 'dependentIndex' : 5 , 'targetIndex' : 3 },
  { 'relationship' : 'advmod' , 'dependentIndex' : 5 , 'targetIndex' : 4 },
  { 'relationship' : 'ccomp' , 'dependentIndex' : 2 , 'targetIndex' : 5 },
  { 'relationship' : 'cpm' , 'dependentIndex' : 5 , 'targetIndex' : 6 },
  { 'relationship' : 'amod' , 'dependentIndex' : 8 , 'targetIndex' : 7 },
  { 'relationship' : 'attr' , 'dependentIndex' : 2 , 'targetIndex' : 8 },
  { 'relationship' : 'attr' , 'dependentIndex' : 2 , 'targetIndex' : 9 },
  { 'relationship' : 'punct' , 'dependentIndex' : 2 , 'targetIndex' : 10 }]

6. Cut sentences

 smoothnlp . split2sentences ( "句子1!句子2!" )
> [ '句子1!' , '句子2!' ]

7. Multithreaded support

SmoothNLP uses 2 Threads by default for service calls;

 from smoothnlp import config
config . setNumThreads ( 2 )

8. Log

 from smoothnlp import config
config . setLogLevel ( "DEBUG" )  ## 设定日志级别

Unsupervised learning

New word mining

Algorithm Introduction | Instructions for use

Event Clustering

We currently only support commercial solutions for this feature, with online services. For more information, please contact [email protected]

Effect demonstration

[
  {
    "url" : " https://36kr.com/p/5167309 " ,
    "title" : " Facebook第三次数据泄露，可能导致680万用户私人照片泄露" ,
    "pub_ts" : 1544832000
  },
  {
    "url" : " https://www.pencilnews.cn/p/24038.html " ,
    "title" : "热点 | Facebook将因为泄露700万用户个人照片 面临16亿美元罚款" ,
    "pub_ts" : 1544832000
  },
  {
    "url" : " https://finance.sina.com.cn/stock/usstock/c/2018-12-15/doc-ihmutuec9334184.shtml " ,
    "title" : " Facebook再曝新数据泄露 6800万用户或受影响" ,
    "pub_ts" : 1544844120
  }
]

Comment: Sina editor’s data is wrong... Exaggerated facts, the real situation Facebook did not leak 68 million photos

Supervised learning

(Information) Event Classification

We currently only support commercial solutions for this function, with online services. For details, please contact [email protected]; Online services support API output.

Effect

Event name	AUC	Precision
Investing and Acquisition	0.996	0.982
Corporate cooperation	0.977	0.885
Directors, Supervisors and Executives	0.982	0.940
Revenue Report	0.994	0.960
Business signing	0.993	0.904
Business Development	0.968	0.869
Product Report	0.977	0.911
Industrial Policy	0.990	0.879
Poor management	0.981	0.765
Discussion on violation	0.951	0.890

References

ASER
HanLP

Tutorial

Multi-threaded call

Service Description

statement

SmoothNLP provides complete REST text analysis and related service applications through cloud microservices . For general users such as open source enthusiasts, we currently provide qps<=5 service support; for commercial users, we provide unrestricted cloud accounts or local deployment solutions.
Including: word ticking, part-of-speech annotation, dependency syntax analysis and other basic NLP tasks are implemented by java code, and are in the folder smoothnlp_maven . It can be compiled and packaged through maven
If you are looking for commercial NLP or knowledge graph solutions, please email [email protected]

Pro Professional Edition

SmoothNLP Pro supports stable and reliable enterprise-level users, documentation; if you want to try or purchase, please [email protected]

Frequently Asked Questions

Note that after the adjustment of version 0.2.20, the following basic Pipeline functions only limit the length of the string (no more than 200). If you process longer corpus, please try smoothnlp.split2sentences for pre-processing of sentence cutting.
The Knowledge Graph Visualization part (before V0.4) supports font SimHei by default. matplotlib in most environments does not support Chinese fonts. We provide a download link for the font package; you can load Simhei fonts into the matplotlib font library by running the following code.

 import matplotlib . pyplot as plt
import matplotlib . font_manager as font_manager
## 设置字体
font_dirs = [ 'simhei/' ]
font_files = font_manager . findSystemFonts ( fontpaths = font_dirs )
font_list = font_manager . createFontList ( font_files )
font_manager . fontManager . ttflist . extend ( font_list )
plt . rcParams [ 'font.family' ] = "SimHei"

Easter eggs

If you have any suggestions for this project or want to become a co-developer, please submit an issue or pull request; as a rebate, we will provide a free data experience with data sharing or kgexplore
If you are interested in NLP-related algorithms or reference scenarios, but lack implementation data, we provide free data support and download.
If you are a college student, seek research materials related to NLP or知识图谱, or even internship opportunities. Welcome to [email protected]

Expand

SmoothNLP

SmoothNLP

Install

Knowledge Graph

Call Examples & Visualization

NLP Basics Pipelines

1. Tokenize word segmentation

2.Postag part-of-speech annotation

3.NER entity recognition

4. Financial entity identification

5. Dependence syntax analysis

6. Cut sentences

7. Multithreaded support

8. Log

Unsupervised learning

New word mining

Event Clustering

Supervised learning

(Information) Event Classification

Tutorial

Service Description

statement

Pro Professional Edition

Frequently Asked Questions

Easter eggs

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express