
OpenHownet API由Thunlp開發,該API提供了一種方便的方式來搜索Hownet,顯示半eme樹,通過SEMEMES等單詞相似性等方面的信息。您還可以訪問我們的網站以享受搜索和展示在線單詞的搜索和展示單詞。
如果您在研究中使用Open Hownet提供的任何數據或API,請引用以下論文:
@article{qi2019openhownet,
title={OpenHowNet: An Open Sememe-based Lexical Knowledge Base},
author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
journal={arXiv preprint arXiv:1901.09957},
year={2019},
}
Hownet是最典型的半知識基礎。半eme被定義為語言學中的最低語義單元,一些語言學家認為,任何語言中所有單詞的含義都可以用有限的半隔板來表示。 Zhendong Dong先生和他的兒子Qiang Dong將這個想法付諸實踐,並花了近30年的時間建立Hownet,該net預先定義了大約2,000個SEMEM,並用它們註釋了200,000多種英語和中文單詞。
自Hownet構建以來,它已被廣泛用於各種NLP任務。您可以參考本文列表,以查看所有與網絡相關的研究。
Hownet Core數據文件(即可以在此處下載的Hownet字典)由237,973個概念(或感官)組成,以中文和英語單詞和短語為代表。 Hownet中的每個概念都用基於半Eme的定義,POS標籤,情感方向,示例句子等註釋。這是Hownet中如何註釋概念的示例:
NO.=000000026417 # Concept ID
W_C=不惜 # Chinese word
G_C=verb # POS tag of the Chinese word
S_C=PlusFeeling|正面情感 # Sentiment orientation
E_C=~牺牲业余时间,~付出全部精力,~出卖自己的灵魂 # Example sentences of the Chinese word
W_E=do not hesitate to # English word
G_E=verb # POS tag of the English word
S_E=PlusFeeling|正面情感 # Sentiment orientation
E_E= # Example sentences of the English word
DEF={willing|愿意} # Sememe-based definition
RMK=
您可以選擇以下兩種方法來安裝OpenHownet API。
pip install OpenHowNetgit clone https://github.com/thunlp/OpenHowNet/
cd OpenHowNet
python setup.py install以下代碼片段說明了OpenHownet API的一些基本功能。您也可以下載此Jupyter筆記本以運行代碼。有關更多功能和詳細信息,請轉到我們的文檔。
import OpenHowNet
hownet_dict = OpenHowNet . HowNetDict ()如果您尚未下載Hownet數據,將會發生錯誤。在這種情況下,您需要首先運行OpenHowNet.download() 。
默認情況下,API將搜索以給定單詞(英語或中文)代表的所有概念(感官),並返回“感官”類中的實例列表。您還可以設置語言以減少搜索時間。如果給定單詞在HowNET中不存在,則此API將返回一個空列表。
> >> # Get all the senses represented by the word "苹果".
>> > result_list = hownet_dict . get_sense ( "苹果" )
> >> print ( "The number of retrievals: " , len ( result_list ))
The number of retrievals : 8
> >> print ( "An example of retrievals: " , result_list )
An example of retrievals : [ No . 244401 | apple |苹果, No . 244402 | malus pumila |苹果, No . 244403 | orchard apple tree |苹果, No . 244396 | apple |苹果, No . 244397 | apple |苹果, No . 244398 | IPHONE |苹果, No . 244399 | apple |苹果, No . 244400 | iphone |苹果]您可以通過“感官實例”獲得感官的詳細信息。
> >> sense_example = result_list [ 0 ]
> >> print ( "Sense example:" , sense_example )
Sense example : No . 244401 | apple |苹果
> >> print ( "Sense id: " , sense_example . No )
Sense id : 000000244401
> >> print ( "English word in the sense: " , sense_example . en_word )
English word in the sense : apple
> >> print ( "Chinese word in the sense: " , sense_example . zh_word )
Chinese word in the sense : 苹果
> >> print ( "HowNet Def of the sense: " , sense_example . Def )
HowNet Def of the sense : { tree |树:{ reproduce |生殖: PatientProduct = { fruit |水果}, agent = { ~ }}}
>> > print ( "Sememe list of the sense: " , sense_example . get_sememe_list ())
Sememe list of the sense : { fruit |水果, tree |树, reproduce |生殖}您可以可視化意義的基於結構化的半eme(即“半eme樹”)
> >> sense_example . visualize_sememe_tree ()
[ sense ] No .244401 | apple |苹果
└── [ None ] tree |树
└── [ agent ] reproduce |生殖
└── [ PatientProduct ] fruit |水果該軟件包提供了API,以獲取Hownet中的所有感官,單詞和半決賽。
> >> all_senses = hownet_dict . get_all_senses ()
> >> print ( "The number of all senses: {}" . format ( len ( all_senses )))
The number of all senses : 237974
> >> zh_word_list = hownet_dict . get_zh_words ()
> >> print ( "Chinese words in HowNet: " , zh_word_list [: 30 ])
Chinese words in HowNet : [ '' , '"' , '#' , '#号标签' , '$' , '$.J.' , '$A.' , '$NZ.' , '%' , "'" , '(' , ')' , '*' , '+' , ',' , '-' , '--' , '.' , '...' , '...为止' , '...也同样使然' , '...以上' , '...以内' , '...以来' , '...何如' , '...内' , '...出什么问题' , '...发生了什么' , '...发生故障' , '...家里有几口人' ]
> >> en_word_list = hownet_dict . get_en_words ()
> >> print ( "English words in HowNet: " , en_word_list [: 30 ])
English words in HowNet : [ 'A' , 'An' , 'Frenchmen' , 'Frenchwomen' , 'Ottomans' , 'a' , 'aardwolves' , 'abaci' , 'abandoned' , 'abbreviated' , 'abode' , 'aboideaux' , 'aboiteaux' , 'abscissae' , 'absorbed' , 'acanthi' , 'acari' , 'accepted' , 'acciaccature' , 'acclaimed' , 'accommodating' , 'accompanied' , 'accounting' , 'accused' , 'acetabula' , 'acetified' , 'aching' , 'acicula' , 'acini' , 'acquired' ]
> >> all_sememes = hownet_dict . get_all_sememes ()
> >> print ( 'There are {} sememes in HowNet' . format ( len ( all_sememes )))
There are 2540 sememes in HowNet 您可以檢索以給定單詞表示的感官的基於半eme的定義。默認情況下,該軟件包將檢索單詞表示的所有感官,並分別返回其SEMEME列表。
> >> hownet_dict . get_sememes_by_word ( word = '苹果' , display = 'list' , merge = False , expanded_layer = - 1 , K = None )
[{ 'sense' : No . 244396 | apple |苹果,
'sememes' : { PatternValue |样式值, SpeBrand |特定牌子, able |能, bring |携带, computer |电脑}},
{ 'sense' : No . 244397 | apple |苹果,
'sememes' : { fruit |水果}},
{ 'sense' : No . 244398 | IPHONE |苹果,
'sememes' : { PatternValue |样式值, SpeBrand |特定牌子, able |能, bring |携带, communicate |交流, tool |用具}},
{ 'sense' : No . 244399 | apple |苹果,
'sememes' : { PatternValue |样式值, SpeBrand |特定牌子, able |能, bring |携带, communicate |交流, tool |用具}},
{ 'sense' : No . 244400 | iphone |苹果,
'sememes' : { PatternValue |样式值, SpeBrand |特定牌子, able |能, bring |携带, communicate |交流, tool |用具}},
{ 'sense' : No . 244401 | apple |苹果,
'sememes' : { fruit |水果, reproduce |生殖, tree |树}},
{ 'sense' : No . 244402 | malus pumila |苹果,
'sememes' : { fruit |水果, reproduce |生殖, tree |树}},
{ 'sense' : No . 244403 | orchard apple tree |苹果,
'sememes' : { fruit |水果, reproduce |生殖, tree |树}}]通過更改display ,可以以列表形式( list ),字典形式( dict ),樹節點形式( tree )和可視化形式( visual )以列表形式(列表),字典形式(dict)顯示。
# Get the sememes in the form of dictionary
> >> hownet_dict . get_sememes_by_word ( word = '苹果' , display = 'dict' )[ 0 ]
{ 'sense' : No . 244396 | apple |苹果, 'sememes' : { 'role' : 'sense' , 'name' : No . 244396 | apple |苹果, 'children' : [{ 'role' : 'None' , 'name' : computer |电脑, 'children' : [{ 'role' : 'modifier' , 'name' : PatternValue |样式值, 'children' : [{ 'role' : 'CoEvent' , 'name' : able |能, 'children' : [{ 'role' : 'scope' , 'name' : bring |携带, 'children' : [{ 'role' : 'patient' , 'name' : '$' }]}]}]}, { 'role' : 'patient' , 'name' : SpeBrand |特定牌子}]}]}}
# Get the sememes in the form of tree node (get the root node of the sememe tree)
> >> d . get_sememes_by_word ( word = '苹果' , display = 'tree' )[ 0 ]
{ 'sense' : No . 244396 | apple |苹果, 'sememes' : Node ( '/No.244396|apple|苹果' , role = 'sense' )}
# Visualize the sememes (Set K to control the num of visualized tree to print)
> >> d . get_sememes_by_word ( word = '苹果' , display = 'visual' , K = 2 )
Find 8 result ( s )
Display #0 sememe tree
[ sense ] No .244396 | apple |苹果
└── [ None ] computer |电脑
├── [ modifier ] PatternValue |样式值
│ └── [ CoEvent ] able |能
│ └── [ scope ] bring |携带
│ └── [ patient ]$
└── [ patient ] SpeBrand |特定牌子
Display #1 sememe tree
[ sense ] No .244397 | apple |苹果
└── [ None ] fruit |水果此外,當display=='list'時,您可以選擇將所有SEMEME列表合併為一個,並通過更改參數expanded_layer (-1表示擴展所有層)來限制Sememe樹的展開層。
> >> hownet_dict . get_sememes_by_word ( word = '苹果' , display = 'list' , merge = True , expanded_layer = - 1 , K = None )
{ PatternValue |样式值, SpeBrand |特定牌子, able |能, bring |携带, communicate |交流, computer |电脑, fruit |水果,
reproduce |生殖, tool |用具, tree |树}您可以通過輸入代表Sememes的單詞(英語或中文)來獲得兩個半決賽之間的關係。您可以選擇顯示(sememe1,關係,sememe2)的三重態。
> >> relations = hownet_dict . get_sememe_relation ( 'FormValue' , '圆' , return_triples = False )
> >> print ( relations )
'hyponym'
> >> triples = hownet_dict . get_sememe_relation ( 'FormValue' , '圆' , return_triples = True )
> >> print ( triples )
[( FormValue |形状值, 'hyponym' , round |圆)]您可以搜索與半eme有一定關係的所有半決賽。同樣,應該用一個單詞(英語或中文)表示半eme,但關係必須是小寫的英語。
> >> triples = hownet_dict . get_related_sememes ( 'FormValue' , relation = 'hyponym' , return_triples = True )
> >> print ( triples )
[( FormValue |形状值, 'hyponym' , round |圆), ( FormValue |形状值, 'hyponym' , unformed |不成形), ( AppearanceValue |外观值, 'hyponym' , FormValue |形状值), ( FormValue |形状值, 'hyponym' , angular |角), ( FormValue |形状值, 'hyponym' , square |方), ( FormValue |形状值, 'hyponym' , netlike |网), ( FormValue |形状值, 'hyponym' , formed |成形)]該實施基於本文:
Jiangming Liu,Jinan Xu,Yujie Zhang。 Hownet對單詞相似性計算的混合層次結構的方法。在2013年IJCNLP會議錄中。 [PDF]
由於需要加載一些文件以進行相似性計算,因此初始化開銷將比以前大。
首先,您可以將hownet_dict對像初始化如下:
> >> hownet_dict_advanced = OpenHowNet . HowNetDict ( init_sim = True )
Initializing OpenHowNet succeeded !
Initializing similarity calculation succeeded !您還可以推遲相似性計算的初始化,直到使用為止。
> >> hownet_dict . initialize_similarity_calculation ()
Initializing similarity calculation succeeded !您可以獲得具有相同基於半度性定義的感官。
> >> s = hownet_dict_advanced . get_sense ( '苹果' )[ 0 ]
> >> hownet_dict_advanced . get_sense_synonyns ( s )[: 10 ]
[ No . 110999 | pear |山梨, No . 111007 | hawthorn |山楂, No . 111009 | haw |山楂树, No . 111010 | hawthorn |山楂树, No . 111268 | Chinese hawthorn |山里红, No . 122955 | Pistacia vera |开心果树, No . 122956 | pistachio |开心果树, No . 122957 | pistachio tree |开心果树, No . 135467 | almond tree |扁桃, No . 154699 | fig |无花果]包裝搜索以給定單詞表示的感官,獲得最近的TOP-K感官,然後返回相應的單詞。請注意,應該設置給定單詞的語言。
您還可以設置單詞的pos,選擇輸出相似性,並將屬於差異感官的所有單詞合併到單個列表中,等等。請參閱文檔以獲取更多信息。
如果輸入單詞不在HOWNET中,則API返回一個空列表。
> >> hownet_dict_advanced . get_nearest_words ( '苹果' , language = 'zh' , K = 5 )
{ No . 244396 | apple |苹果: [ 'IBM' , '东芝' , '华为' , '戴尔' , '索尼' ],
No . 244397 | apple |苹果: [ '丑橘' , '乌梅' , '五敛子' , '凤梨' , '刺梨' ],
No . 244398 | IPHONE |苹果: [ 'OPPO' , '华为' , '苹果' , '智能手机' , '彩笔' ],
No . 244399 | apple |苹果: [ 'OPPO' , '华为' , '苹果' , '智能手机' , '彩笔' ],
No . 244400 | iphone |苹果: [ 'OPPO' , '华为' , '苹果' , '智能手机' , '彩笔' ],
No . 244401 | apple |苹果: [ '山梨' , '山楂' , '山楂树' , '山里红' , '开心果树' ],
No . 244402 | malus pumila |苹果: [ '山梨' , '山楂' , '山楂树' , '山里红' , '开心果树' ],
No . 244403 | orchard apple tree |苹果: [ '山梨' , '山楂' , '山楂树' , '山里红' , '开心果树' ]}
> >> hownet_dict_advanced . get_nearest_words ( '苹果' , language = 'zh' , K = 5 , merge = True )
[ 'IBM' , '东芝' , '华为' , '戴尔' , '索尼' ]如果兩個給定單詞中的任何一個都不存在於Hownet中,則它將返回-1 。
> >> print ( 'The similarity of 苹果 and 梨 is {}.' . format ( hownet_dict_advanced . calculate_word_similarity ( '苹果' , '梨' )))
The similarity of 苹果 and 梨 is 1.0 .該軟件包集成了查詢功能,以獲取Babelnet(Babelnet Synsets)中合成器信息的信息。 Babelnet是由Babelnet Synsets組成的多語言百科全書詞典,每種都包含一些具有相同含義的多語言同義詞。以下工作註釋了某些Babelnet合成器的SEMEMES,此部分的功能基於其註釋結果。
建立多語言的半知識知識基礎:預測Babelnet Synsets的半決賽。 Fanchao Qi,Liang Chang,Maosong Sun,Sicong Ouyang和Zhiyuan Liu 。 AAAI-20。 [PDF] [代碼]
首先,您應該初始化babelnet合成詞字典:
> >> hownet_dict . initialize_babelnet_dict ()
Initializing BabelNet synset Dict succeeded !
# Or you can initialize when create the HowNetDict instance
>> > hownet_dict_advanced = HowNetDict ( init_babel = True )
Initializing OpenHowNet succeeded !
Initializing BabelNet synset Dict succeeded !以下API允許您查詢Babelnet同步(中文和英語同義詞,定義,圖片URL等)中的豐富信息。
> >> syn_list = hownet_dict_advanced . get_synset ( '黄色' )
> >> print ( "{} results are retrieved and take the first one as an example" . format ( len ( syn_list )))
3 results are retrieved and take the first one as an example
>> > syn_example = syn_list [ 0 ]
> >> print ( "Synset: {}" . format ( syn_example ))
Synset : bn : 00113968 a | yellow |黄
> >> print ( "English synonyms: {}" . format ( syn_example . en_synonyms ))
English synonyms : [ 'yellow' , 'yellowish' , 'xanthous' ]
> >> print ( "Chinese synonyms: {}" . format ( syn_example . zh_synonyms ))
Chinese synonyms : [ '黄' , '黄色' , '淡黄色+的' , '黄色+的' , '微黄色' , '微黄色+的' , '黄+的' , '淡黄色' ]
> >> print ( "English glosses: {}" . format ( syn_example . en_glosses ))
English glosses : [ 'Of the color intermediate between green and orange in the color spectrum; of something resembling the color of an egg yolk' , 'Having the colour of a yolk, a lemon or gold.' ]
> >> print ( "Chinese glosses: {}" . format ( syn_example . zh_glosses ))
Chinese glosses : [ '像丝瓜花或向日葵花的颜色。' ]您可以使用給定的合成器獲得相關的Babelnet合成器。
> >> related_synsets = syn_example . get_related_synsets ()
> >> print ( "There are {} synsets that have relation with the {}, they are: " . format ( len ( related_synsets ), syn_example ))
There are 6 synsets that have relation with the bn : 00113968 a | yellow |黄, they are :
>> > print ( related_synsets )
[ bn : 00099663 a | chromatic |彩色, bn : 00029925 n | egg_yolk |蛋黄, bn : 00092876 v | resemble |相似, bn : 00020726 n | color |颜色, bn : 00020748 n | visible_spectrum |可见光, bn : 00081866 n | yellow |黄色]您可以通過輸入Babelnet Synset中的單詞來獲得Babelnet Synset的半決賽:
> >> print ( hownet_dict_advanced . get_sememes_by_word_in_BabelNet ( '黄色' ))
[{ 'synset' : bn : 00113968 a | yellow |黄, 'sememes' : [ yellow |黄]}, { 'synset' : bn : 00101430 a | dirty |淫秽的, 'sememes' : [ lascivious |淫, dirty |龊, despicable |卑劣, BadSocial |坏风气]}, { 'synset' : bn : 00081866 n | yellow |黄色, 'sememes' : [ yellow |黄]}]
> >> print ( hownet_dict_advanced . get_sememes_by_word_in_BabelNet ( '黄色' , merge = True ))
[ lascivious |淫, despicable |卑劣, BadSocial |坏风气, dirty |龊, yellow |黄]有關更多詳細說明,請參考文檔。
如果代碼或數據對您有所幫助,請引用以下論文:
@article{qi2019openhownet,
title={Openhownet: An open sememe-based lexical knowledge base},
author={Qi, Fanchao and Yang, Chenghao and Liu, Zhiyuan and Dong, Qiang and Sun, Maosong and Dong, Zhendong},
journal={arXiv preprint arXiv:1901.09957},
year={2019}
}