
A multilingual natural language processing toolkit for production environments is based on PyTorch and TensorFlow 2.x dual engines, with the goal of popularizing the most cutting-edge NLP technology. HanLP has the characteristics of complete functions, accurate accuracy, efficient performance, new corpus, clear architecture, and customizable.
With the world's largest multilingual corpus, HanLP2.1 supports 10 joint tasks and multiple single tasks in 130 languages, including traditional Chinese, Chinese, English, Japanese, Japanese, Russian, French and German. HanLP pre-trains dozens of models on more than a dozen tasks and is continuously iterating the corpus and models:
| Function | RESTful | Multitasking | Single task | Model | Labeling standards |
|---|---|---|---|---|---|
| Participle | Tutorial | Tutorial | Tutorial | tok | Coarse, subdivision |
| Part of speech annotation | Tutorial | Tutorial | Tutorial | pos | CTB, PKU, 863 |
| Named entity recognition | Tutorial | Tutorial | Tutorial | ner | PKU, MSRA, OntoNotes |
| Dependence syntax analysis | Tutorial | Tutorial | Tutorial | dep | SD, UD, PMT |
| Component syntax analysis | Tutorial | Tutorial | Tutorial | con | Chinese Tree Bank |
| Semantic dependency analysis | Tutorial | Tutorial | Tutorial | sdp | CSDP |
| Semantic role annotation | Tutorial | Tutorial | Tutorial | srl | Chinese Proposition Bank |
| Abstract meaning expression | Tutorial | None yet | Tutorial | amr | CAMR |
| Refers to the dissolution | Tutorial | None yet | None yet | None yet | OntoNotes |
| Semantic text similarity | Tutorial | None yet | Tutorial | sts | None yet |
| Text style conversion | Tutorial | None yet | None yet | None yet | None yet |
| Keyword phrase extraction | Tutorial | None yet | None yet | None yet | None yet |
| Extracted automatic summary | Tutorial | None yet | None yet | None yet | None yet |
| Generative automatic summary | Tutorial | None yet | None yet | None yet | None yet |
| Text syntax correction | Tutorial | None yet | None yet | None yet | None yet |
| Text classification | Tutorial | None yet | None yet | None yet | None yet |
| Sentiment Analysis | Tutorial | None yet | None yet | None yet | [-1,+1] |
| Language detection | Tutorial | None yet | Tutorial | None yet | ISO 639-1 encoding |
Tailored, HanLP provides two APIs: RESTful and native , which are aimed at two scenarios: lightweight and massive. Regardless of the API and language, the HanLP interface remains semantically consistent and insists on open source in code. If you have used HanLP in your research, please cite our EMNLP paper.
Only a few KBs, suitable for agile development, mobile APP and other scenarios. Simple and easy to use, no need for GPU to install, and it is installed in seconds. More corpus, larger models, higher accuracy, highly recommended . The server GPU computing power is limited and the anonymous user quota is small. It is recommended to apply for a free public welfare API key auth .
pip install hanlp_restfulCreate a client and fill in the server address and secret key:
from hanlp_restful import HanLPClient
HanLP = HanLPClient ( 'https://www.hanlp.com/api' , auth = None , language = 'zh' ) # auth不填则匿名,zh中文,mul多语种 Install go get -u github.com/hankcs/gohanlp@main , create a client, fill in the server address and secret key:
HanLP := hanlp . HanLPClient ( hanlp . WithAuth ( "" ), hanlp . WithLanguage ( "zh" )) // auth不填则匿名,zh中文,mul多语种 Add dependencies in pom.xml :
< dependency >
< groupId >com.hankcs.hanlp.restful</ groupId >
< artifactId >hanlp-restful</ artifactId >
< version >0.0.12</ version >
</ dependency >Create a client and fill in the server address and secret key:
HanLPClient HanLP = new HanLPClient ( "https://www.hanlp.com/api" , null , "zh" ); // auth不填则匿名,zh中文,mul多语种 No matter what development language, call the parse interface and pass in an article to obtain HanLP's accurate analysis results.
HanLP . parse ( "2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。" )For more functions, please refer to the documentation and test cases.
Relying on deep learning technologies such as PyTorch and TensorFlow, it is suitable for professional NLP engineers, researchers and local massive data scenarios. Requires Python 3.6 to 3.10, supports Windows, and *nix is recommended. Can run on the CPU, GPU/TPU is recommended. Install PyTorch version:
pip install hanlpThe models released by HanLP are divided into two types: multi-task and single-task. Multi-task speed is fast and saves video memory, and single-task accuracy is high and flexible.
HanLP's workflow is to load the model and then call it as a function, such as the following joint multitasking model:
import hanlp
HanLP = hanlp . load ( hanlp . pretrained . mtl . CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH ) # 世界最大中文语料库
HanLP ([ '2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。' , '阿婆主来到北京立方庭参观自然语义科技公司。' ])The input unit of the Native API is a sentence, and it is necessary to use a multilingual clause model or a rule-based clause function to pre-section. The semantic designs of the two APIs RESTful and native are completely consistent, and users can seamlessly interchange. The simple interface also supports flexible parameters. Common techniques include:
tasks scheduling, the fewer tasks, the faster the speed. See the tutorial for details. In scenarios with limited memory, users can also delete unnecessary tasks to achieve the effect of model slimming.According to our latest research, the advantages of multitask learning lie in speed and video memory, but the accuracy is often not as good as the single-task model. So, HanLP pretrained many single-task models and designed elegant pipeline modes to assemble them.
import hanlp
HanLP = hanlp . pipeline ()
. append ( hanlp . utils . rules . split_sentence , output_key = 'sentences' )
. append ( hanlp . load ( 'FINE_ELECTRA_SMALL_ZH' ), output_key = 'tok' )
. append ( hanlp . load ( 'CTB9_POS_ELECTRA_SMALL' ), output_key = 'pos' )
. append ( hanlp . load ( 'MSRA_NER_ELECTRA_SMALL_ZH' ), output_key = 'ner' , input_key = 'tok' )
. append ( hanlp . load ( 'CTB9_DEP_ELECTRA_SMALL' , conll = 0 ), output_key = 'dep' , input_key = 'tok' )
. append ( hanlp . load ( 'CTB9_CON_ELECTRA_SMALL' ), output_key = 'con' , input_key = 'tok' )
HanLP ( '2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。' )For more features, please refer to the demo and documentation for more models and usage.
No matter what API, development language and natural language, HanLP's output is unified into a json format compatible Document is dict :
{
"tok/fine" : [
[ " 2021年" , " HanLPv2.1 " , "为" , "生产" , "环境" , "带来" , "次" , "世代" , "最" , "先进" , "的" , "多" , "语种" , " NLP " , "技术" , " 。 " ],
[ "阿婆主" , "来到" , "北京" , "立方庭" , "参观" , "自然" , "语义" , "科技" , "公司" , " 。 " ]
],
"tok/coarse" : [
[ " 2021年" , " HanLPv2.1 " , "为" , "生产" , "环境" , "带来" , "次世代" , "最" , "先进" , "的" , "多语种" , " NLP " , "技术" , " 。 " ],
[ "阿婆主" , "来到" , "北京立方庭" , "参观" , "自然语义科技公司" , " 。 " ]
],
"pos/ctb" : [
[ " NT " , " NR " , " P " , " NN " , " NN " , " VV " , " JJ " , " NN " , " AD " , " JJ " , " DEG " , " CD " , " NN " , " NR " , " NN " , " PU " ],
[ " NN " , " VV " , " NR " , " NR " , " VV " , " NN " , " NN " , " NN " , " NN " , " PU " ]
],
"pos/pku" : [
[ " t " , " nx " , " p " , " vn " , " n " , " v " , " b " , " n " , " d " , " a " , " u " , " a " , " n " , " nx " , " n " , " w " ],
[ " n " , " v " , " ns " , " ns " , " v " , " n " , " n " , " n " , " n " , " w " ]
],
"pos/863" : [
[ " nt " , " w " , " p " , " v " , " n " , " v " , " a " , " nt " , " d " , " a " , " u " , " a " , " n " , " ws " , " n " , " w " ],
[ " n " , " v " , " ns " , " n " , " v " , " n " , " n " , " n " , " n " , " w " ]
],
"ner/pku" : [
[],
[[ "北京立方庭" , " ns " , 2 , 4 ], [ "自然语义科技公司" , " nt " , 5 , 9 ]]
],
"ner/msra" : [
[[ " 2021年" , " DATE " , 0 , 1 ], [ " HanLPv2.1 " , " ORGANIZATION " , 1 , 2 ]],
[[ "北京" , " LOCATION " , 2 , 3 ], [ "立方庭" , " LOCATION " , 3 , 4 ], [ "自然语义科技公司" , " ORGANIZATION " , 5 , 9 ]]
],
"ner/ontonotes" : [
[[ " 2021年" , " DATE " , 0 , 1 ], [ " HanLPv2.1 " , " ORG " , 1 , 2 ]],
[[ "北京立方庭" , " FAC " , 2 , 4 ], [ "自然语义科技公司" , " ORG " , 5 , 9 ]]
],
"srl" : [
[[[ " 2021年" , " ARGM-TMP " , 0 , 1 ], [ " HanLPv2.1 " , " ARG0 " , 1 , 2 ], [ "为生产环境" , " ARG2 " , 2 , 5 ], [ "带来" , " PRED " , 5 , 6 ], [ "次世代最先进的多语种NLP技术" , " ARG1 " , 6 , 15 ]], [[ "最" , " ARGM-ADV " , 8 , 9 ], [ "先进" , " PRED " , 9 , 10 ], [ "技术" , " ARG0 " , 14 , 15 ]]],
[[[ "阿婆主" , " ARG0 " , 0 , 1 ], [ "来到" , " PRED " , 1 , 2 ], [ "北京立方庭" , " ARG1 " , 2 , 4 ]], [[ "阿婆主" , " ARG0 " , 0 , 1 ], [ "参观" , " PRED " , 4 , 5 ], [ "自然语义科技公司" , " ARG1 " , 5 , 9 ]]]
],
"dep" : [
[[ 6 , " tmod " ], [ 6 , " nsubj " ], [ 6 , " prep " ], [ 5 , " nn " ], [ 3 , " pobj " ], [ 0 , " root " ], [ 8 , " amod " ], [ 15 , " nn " ], [ 10 , " advmod " ], [ 15 , " rcmod " ], [ 10 , " assm " ], [ 13 , " nummod " ], [ 15 , " nn " ], [ 15 , " nn " ], [ 6 , " dobj " ], [ 6 , " punct " ]],
[[ 2 , " nsubj " ], [ 0 , " root " ], [ 4 , " nn " ], [ 2 , " dobj " ], [ 2 , " conj " ], [ 9 , " nn " ], [ 9 , " nn " ], [ 9 , " nn " ], [ 5 , " dobj " ], [ 2 , " punct " ]]
],
"sdp" : [
[[[ 6 , " Time " ]], [[ 6 , " Exp " ]], [[ 5 , " mPrep " ]], [[ 5 , " Desc " ]], [[ 6 , " Datv " ]], [[ 13 , " dDesc " ]], [[ 0 , " Root " ], [ 8 , " Desc " ], [ 13 , " Desc " ]], [[ 15 , " Time " ]], [[ 10 , " mDegr " ]], [[ 15 , " Desc " ]], [[ 10 , " mAux " ]], [[ 8 , " Quan " ], [ 13 , " Quan " ]], [[ 15 , " Desc " ]], [[ 15 , " Nmod " ]], [[ 6 , " Pat " ]], [[ 6 , " mPunc " ]]],
[[[ 2 , " Agt " ], [ 5 , " Agt " ]], [[ 0 , " Root " ]], [[ 4 , " Loc " ]], [[ 2 , " Lfin " ]], [[ 2 , " ePurp " ]], [[ 8 , " Nmod " ]], [[ 9 , " Nmod " ]], [[ 9 , " Nmod " ]], [[ 5 , " Datv " ]], [[ 5 , " mPunc " ]]]
],
"con" : [
[ " TOP " , [[ " IP " , [[ " NP " , [[ " NT " , [ " 2021年" ]]]], [ " NP " , [[ " NR " , [ " HanLPv2.1 " ]]]], [ " VP " , [[ " PP " , [[ " P " , [ "为" ]], [ " NP " , [[ " NN " , [ "生产" ]], [ " NN " , [ "环境" ]]]]]], [ " VP " , [[ " VV " , [ "带来" ]], [ " NP " , [[ " ADJP " , [[ " NP " , [[ " ADJP " , [[ " JJ " , [ "次" ]]]], [ " NP " , [[ " NN " , [ "世代" ]]]]]], [ " ADVP " , [[ " AD " , [ "最" ]]]], [ " VP " , [[ " JJ " , [ "先进" ]]]]]], [ " DEG " , [ "的" ]], [ " NP " , [[ " QP " , [[ " CD " , [ "多" ]]]], [ " NP " , [[ " NN " , [ "语种" ]]]]]], [ " NP " , [[ " NR " , [ " NLP " ]], [ " NN " , [ "技术" ]]]]]]]]]], [ " PU " , [ " 。 " ]]]]]],
[ " TOP " , [[ " IP " , [[ " NP " , [[ " NN " , [ "阿婆主" ]]]], [ " VP " , [[ " VP " , [[ " VV " , [ "来到" ]], [ " NP " , [[ " NR " , [ "北京" ]], [ " NR " , [ "立方庭" ]]]]]], [ " VP " , [[ " VV " , [ "参观" ]], [ " NP " , [[ " NN " , [ "自然" ]], [ " NN " , [ "语义" ]], [ " NN " , [ "科技" ]], [ " NN " , [ "公司" ]]]]]]]], [ " PU " , [ " 。 " ]]]]]]
]
}In particular, Python RESTful and native APIs support visualization based on monospace fonts, which can directly visualize linguistic structures in the console:
HanLP ([ '2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。' , '阿婆主来到北京立方庭参观自然语义科技公司。' ]). pretty_print ()
Dep Tree Token Relati PoS Tok NER Type Tok SRL PA1 Tok SRL PA2 Tok PoS 3 4 5 6 7 8 9
──────────── ───────── ────── ─── ───────── ──────────────── ───────── ──────────── ───────── ──────────── ───────── ─────────────────────────────────────────────────────────
┌─────────► 2021年 tmod NT 2021年 ───► DATE 2021年 ───► ARGM - TMP 2021年 2021年 NT ───────────────────────────────────────────► NP ───┐
│┌────────► HanLPv2 . 1 nsubj NR HanLPv2 . 1 ───► ORGANIZATION HanLPv2 .1 ───► ARG0 HanLPv2 .1 HanLPv2 .1 NR ───────────────────────────────────────────► NP ────┤
││┌─►┌───── 为 prep P 为 为 ◄─┐ 为 为 P ───────────┐ │
│││ │ ┌─► 生产 nn NN 生产 生产 ├► ARG2 生产 生产 NN ──┐ ├────────────────────────► PP ───┐ │
│││ └─►└── 环境 pobj NN 环境 环境 ◄─┘ 环境 环境 NN ──┴► NP ───┘ │ │
┌┼┴┴──────── 带来 root VV 带来 带来 ╟──► PRED 带来 带来 VV ──────────────────────────────────┐ │ │
││ ┌─► 次 amod JJ 次 次 ◄─┐ 次 次 JJ ───► ADJP ──┐ │ ├► VP ────┤
││ ┌───►└── 世代 nn NN 世代 世代 │ 世代 世代 NN ───► NP ───┴► NP ───┐ │ │ │
││ │ ┌─► 最 advmod AD 最 最 │ 最 ───► ARGM - ADV 最 AD ───────────► ADVP ──┼► ADJP ──┐ ├► VP ───┘ ├► IP
││ │┌──►├── 先进 rcmod JJ 先进 先进 │ 先进 ╟──► PRED 先进 JJ ───────────► VP ───┘ │ │ │
││ ││ └─► 的 assm DEG 的 的 ├► ARG1 的 的 DEG ──────────────────────────┤ │ │
││ ││ ┌─► 多 nummod CD 多 多 │ 多 多 CD ───► QP ───┐ ├► NP ───┘ │
││ ││┌─►└── 语种 nn NN 语种 语种 │ 语种 语种 NN ───► NP ───┴────────► NP ────┤ │
││ │││ ┌─► NLP nn NR NLP NLP │ NLP NLP NR ──┐ │ │
│└─►└┴┴──┴── 技术 dobj NN 技术 技术 ◄─┘ 技术 ───► ARG0 技术 NN ──┴────────────────► NP ───┘ │
└──────────► 。 punct PU 。 。 。 。 PU ──────────────────────────────────────────────────┘
Dep Tree Tok Relat Po Tok NER Type Tok SRL PA1 Tok SRL PA2 Tok Po 3 4 5 6
──────────── ─── ───── ── ─── ──────────────── ─── ──────── ─── ──────── ─── ────────────────────────────────
┌─► 阿婆主 nsubj NN 阿婆主 阿婆主 ───► ARG0 阿婆主 ───► ARG0 阿婆主 NN ───────────────────► NP ───┐
┌┬────┬──┴── 来到 root VV 来到 来到 ╟──► PRED 来到 来到 VV ──────────┐ │
││ │ ┌─► 北京 nn NR 北京 ───► LOCATION 北京 ◄─┐ 北京 北京 NR ──┐ ├► VP ───┐ │
││ └─►└── 立方庭 dobj NR 立方庭 ───► LOCATION 立方庭 ◄─┴► ARG1 立方庭 立方庭 NR ──┴► NP ───┘ │ │
│└─►┌─────── 参观 conj VV 参观 参观 参观 ╟──► PRED 参观 VV ──────────┐ ├► VP ────┤
│ │ ┌───► 自然 nn NN 自然 ◄─┐ 自然 自然 ◄─┐ 自然 NN ──┐ │ │ ├► IP
│ │ │┌──► 语义 nn NN 语义 │ 语义 语义 │ 语义 NN │ ├► VP ───┘ │
│ │ ││┌─► 科技 nn NN 科技 ├► ORGANIZATION 科技 科技 ├► ARG1 科技 NN ├► NP ───┘ │
│ └─►└┴┴── 公司 dobj NN 公司 ◄─┘ 公司 公司 ◄─┘ 公司 NN ──┘ │
└──────────► 。 punct PU 。 。 。 。 PU ──────────────────────────┘ For the meaning of the label set, please refer to the "Linguistic Labeling Specifications" and "Format Specifications". We have purchased, marked or used the world's largest and most diverse corpus for joint multi-language multi-task learning, so the HanLP's annotation set is also the most extensive.
Writing a deep learning model is not difficult at all, but the difficulty is reproducing a higher accuracy rate. The following code shows how to spend 6 minutes on the sighan2005 PKU corpus to train a Chinese word segmentation model that goes beyond the academic world.
tokenizer = TransformerTaggingTokenizer ()
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.73'
tokenizer . fit (
SIGHAN2005_PKU_TRAIN_ALL ,
SIGHAN2005_PKU_TEST , # Conventionally, no devset is used. See Tian et al. (2020).
save_dir ,
'bert-base-chinese' ,
max_seq_len = 300 ,
char_level = True ,
hard_constraint = True ,
sampler_builder = SortingSamplerBuilder ( batch_size = 32 ),
epochs = 3 ,
adam_epsilon = 1e-6 ,
warmup_steps = 0.1 ,
weight_decay = 0.01 ,
word_dropout = 0.1 ,
seed = 1660853059 ,
)
tokenizer . evaluate ( SIGHAN2005_PKU_TEST , save_dir ) Among them, since a random number seed is specified, the result must be 96.73 . Unlike those falsely advertised academic papers or commercial projects, HanLP guarantees that all results can be reproduced. If you have any questions, we will troubleshoot the problem as the highest priority fatal bug.
Please refer to demo for more training scripts.
| lang | Corpora | model | tok | pos | ner | dep | con | srl | sdp | lem | fea | amr | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| fine | Coarse | ctb | pku | 863 | ud | pku | msra | ontonotes | SemEval16 | DM | PAS | PSD | |||||||||
| mul | UD2.7 OntoNotes5 | Small | 98.62 | - | - | - | - | 93.23 | - | - | 74.42 | 79.10 | 76.85 | 70.63 | - | 91.19 | 93.67 | 85.34 | 87.71 | 84.51 | - |
| base | 98.97 | - | - | - | - | 90.32 | - | - | 80.32 | 78.74 | 71.23 | 73.63 | - | 92.60 | 96.04 | 81.19 | 85.08 | 82.13 | - | ||
| zh | open | Small | 97.25 | - | 96.66 | - | - | - | - | - | 95.00 | 84.57 | 87.62 | 73.40 | 84.57 | - | - | - | - | - | - |
| base | 97.50 | - | 97.07 | - | - | - | - | - | 96.04 | 87.11 | 89.84 | 77.78 | 87.11 | - | - | - | - | - | - | ||
| Close | Small | 96.70 | 95.93 | 96.87 | 97.56 | 95.05 | - | 96.22 | 95.74 | 76.79 | 84.44 | 88.13 | 75.81 | 74.28 | - | - | - | - | - | - | |
| base | 97.52 | 96.44 | 96.99 | 97.59 | 95.29 | - | 96.48 | 95.72 | 77.77 | 85.29 | 88.57 | 76.52 | 73.76 | - | - | - | - | - | - | ||
| ernie | 96.95 | 97.29 | 96.76 | 97.64 | 95.22 | - | 97.31 | 96.47 | 77.95 | 85.67 | 89.17 | 78.51 | 74.10 | - | - | - | - | - | - | ||
The data preprocessing and splitting ratios adopted by HanLP are not necessarily the same as popular methods. For example, HanLP adopts the full version of the MSRA named entity recognition corpus instead of the castrated version used by the public; HanLP uses the Stanford Dependencies standard with a wider syntax coverage, rather than the Zhang and Clark (2008) standard adopted by the academic community; HanLP proposes a method of uniform segmentation of CTBs instead of the uneven academic community and missing 51 gold documents. HanLP opens the source of a complete set of corpus preprocessing scripts and corresponding corpus, striving to promote the transparency of Chinese NLP.
In short, HanLP only does what we think is correct and advanced, not necessarily what is popular and authoritative.
If you use HanLP in your research, please quote it in the following format:
@inproceedings { he-choi-2021-stem ,
title = " The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders " ,
author = " He, Han and Choi, Jinho D. " ,
booktitle = " Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing " ,
month = nov,
year = " 2021 " ,
address = " Online and Punta Cana, Dominican Republic " ,
publisher = " Association for Computational Linguistics " ,
url = " https://aclanthology.org/2021.emnlp-main.451 " ,
pages = " 5555--5577 " ,
abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}The authorization agreement for HanLP source code is Apache License 2.0 , which can be used for commercial purposes for free. Please attach HanLP's link and authorization agreement to the product description. HanLP is protected by copyright law and infringement will be pursued.
HanLP operates independently from v1.7, with Natural Semantics (Qingdao) Technology Co., Ltd. as the main body of the project, leading the development of subsequent versions and having the copyright of subsequent versions.
HanLP v1.3~v1.65 versions are developed by Dakuai Search and continue to be completely open source. Dakuai Search has the relevant copyright.
HanLP was supported by Shanghai Linyuan Company in the early days and has the copyright of the 1.28 and previous versions. The relevant versions have also been released on the Shanghai Linyuan Company website.
The authorization of machine learning models is not legally determined, but in the spirit of respecting the original authorization of open source corpus, if not specifically stated, HanLP's multilingual model authorization continues to use CC BY-NC-SA 4.0, and the Chinese model authorization is for research and teaching purposes only.
https://hanlp.hankcs.com/docs/references.html