negspacy下載 - negspacy源代碼下載

negspacy

其他源碼

Spacy 3.3 support

下載

負面：否定spacy

Spacy Pipeline對象，用於否定文本中的概念。基於NEGEX算法。

NEGEX-一種簡單的算法，用於識別出院摘要中的否定發現和疾病Chapman，Bridewell，Hanbury，Cooper，Buchanan，Buchanan https://doi.org/10.1006/jbin.2001.2001.1029

什麼是新的

1.0版是一個主要版本更新，可為Spacy 3.0的新接口提供支持，以添加管道組件。結果，它與先前版本的負面版本不兼容。

如果您的項目使用Spacy 2.3.5或更早，則需要使用0.1.9版。參見存檔的讀書文件。

安裝和用法

安裝庫。

pip install negspacy

導入圖書館和Spacy。

 import spacy
from negspacy . negation import Negex

負載Spacy語言模型。添加負面管道對象。實體類型過濾是可選的。

 nlp = spacy . load ( "en_core_web_sm" )
nlp . add_pipe ( "negex" , config = { "ent_types" :[ "PERSON" , "ORG" ]})

查看否定。

 doc = nlp ( "She does not like Steve Jobs but likes Apple products." )

for e in doc . ents :
	print ( e . text , e . _ . negex )

 Steve Jobs True
Apple False

考慮與Scispacy配對以在文本和過程否定中找到UMLS概念。

NEGEX模式

pseudo_negations-虛假觸發器，模棱兩可的否定或雙重負面的短語
先前的_negations-在實體之前的否定短語
以下_negations-遵循實體的否定短語
終止- 出於否定檢測的目的（.eg，“但是”）的句子的短語

術語集

指定使用的術語，默認情況下使用en_clinical 。

en =通用英語文字的短語
en_clinical默認值=添加特定於臨床領域的短語到一般英語
en_clinical_sensitive =添加其他短語，以幫助排除歷史和可能無關的實體

設定:

 from negspacy . negation import Negex
from negspacy . termsets import termset

ts = termset ( "en" )

nlp = spacy . load ( "en_core_web_sm" )
nlp . add_pipe (
    "negex" ,
    config = {
        "neg_termset" : ts . get_patterns ()
    }
)

附加功能

更改使用中的模式或查看模式

用自己的套裝替換所有圖案

 nlp = spacy . load ( "en_core_web_sm" )
nlp . add_pipe (
    "negex" , 
    config = {
        "neg_termset" :{
            "pseudo_negations" : [ "might not" ],
            "preceding_negations" : [ "not" ],
            "following_negations" :[ "declined" ],
            "termination" : [ "but" , "however" ]
        }
    }
    )

從內置項中添加和刪除單獨的模式

 from negspacy . termsets import termset
ts = termset ( "en" )
ts . add_patterns ({
            "pseudo_negations" : [ "my favorite pattern" ],
            "termination" : [ "these are" , "great patterns" , "but" ],
            "preceding_negations" : [ "wow a negation" ],
            "following_negations" : [ "extra negation" ],
        })
#OR
ts . remove_patterns (
        {
            "termination" : [ "these are" , "great patterns" ],
            "pseudo_negations" : [ "my favorite pattern" ],
            "preceding_negations" : [ "denied" , "wow a negation" ],
            "following_negations" : [ "unlikely" , "extra negation" ],
        }
    )

查看使用中的模式

 from negspacy . termsets import termset
ts = termset ( "en_clinical" )
print ( ts . get_patterns ())

名詞塊的否定

根據您使用的命名實體識別模型，您可能會與名詞“共同”。例如：

 nlp = spacy . load ( "en_core_sci_sm" )
doc = nlp ( "There is no headache." )
for e in doc . ents :
    print ( e . text )

# no headache

這將導致NEGEX算法錯過了前面的否定。為此，您可以添加chunk_prefix ：

 nlp = spacy . load ( "en_core_sci_sm" )
ts = termset ( "en_clinical" )
nlp . add_pipe (
    "negex" ,
    config = {
        "chunk_prefix" : [ "no" ],
    },
    last = True ,
)
doc = nlp ( "There is no headache." )
for e in doc . ents :
    print ( e . text , e . _ . negex )

# no headache True