neuralcoref下載neuralcoref源代碼下載

中文(繁体)

中文(简体) 中文(繁体) 한국어 日本語 English Português Español Русский العربية Indonesia Deutsch Français ภาษาไทย

首頁>編程相關>其他源碼

neuralcoref

其他源碼

v4.0.0: Simpler installation and integration with SpaCy 2.1+, simple domain-knowledge integration

下載

NeuralCoref 4.0：神經網絡的核心分辨率。

NeuralCoref是Spacy 2.1+的管道擴展，它使用神經網絡來註釋和解決核心群集。 NeuralCoref已經準備好生產，並集成在Spacy的NLP管道中，並可以擴展到新的培訓數據集。

有關Coreference解決方案和NeuralCoref的簡要介紹，請參閱我們的博客文章。 NeuralCoref用Python/Cython編寫，僅針對英語提供了預先訓練的統計模型。

NeuralCoref伴隨著可視化客戶端NeuralCoref-viz，這是一個由REST服務器供電的Web接口，可以在線嘗試。 NeuralCoref根據MIT許可發布。

現在4.0版！在PIP上可用，與Spacy 2.1+兼容。

操作系統：MacOS / OS X·Linux·Windows（Cygwin，Mingw，Visual Studio）
Python版本：Python 3.6+（只有64位）
軟件包經理：[PIP]

安裝NeuralCoref

與PIP一起安裝NeuralCoref

這是安裝NeuralCoref的最簡單方法。

pip install neuralcoref

`spacy.strings.StringStore size changed`錯誤

如果您提到spacy.strings.StringStore size changed, may indicate binary incompatibility在將神經核心與import neuralcoref NeuralCoref加載時二進制不相容性，這意味著您必須從分發源中安裝NeuralCoref，而不是從發行源中安裝NeuralCoref，而不是從車輪中安裝神經核心才能使您的系統構建為spacy of Spacy的最新版本。

在這種情況下，只需重新安裝NeuralCoref如下：

pip uninstall neuralcoref
pip install neuralcoref --no-binary neuralcoref

安裝Spacy的型號

為了能夠使用NeuralCoref，您還需要擁有一個英語模型來進行spacy。

您可以使用任何英語模型在應用程序中都可以使用的任何功能，但是請注意，神經科的性能很大程度上取決於Spacy模型的性能，尤其是Spacy Model的Tagger，Parser和NER組件的性能。因此，更大的Spacy英語模型也將提高核心分辨率的質量（請參閱下面的內部和模型部分中的一些詳細信息）。

這是您如何安裝Spacy和（小）英語模型的一個示例，可以在Spacy網站上找到更多信息：

pip install -U spacy
python -m spacy download en

來自來源的NeuralCoref

您也可以從來源安裝NeururCoref。您將需要首先安裝依賴項，其中包括Cython和Spacy。

這是過程：

venv .env
source .env/bin/activate
git clone https://github.com/huggingface/neuralcoref.git
cd neuralcoref
pip install -r requirements.txt
pip install -e .

內部和模型

NeuralCoref由兩個子模塊組成：

基於規則的提及檢測模塊，該模塊使用Spacy的標籤器，解析器和NER註釋來識別一組潛在的核心率提及，並
饋送前向神經網絡計算每對潛在提及的核心評分。

您第一次在Python中導入NeuralCoref時，它將在緩存文件夾中下載神經網絡模型的權重。

默認情況下，緩存文件夾將設置為~/.neuralcoref_cache （請參閱File_utils.py），但是可以通過設置環境變量NEURALCOREF_CACHE來指向另一個位置，可以過度過度。

可以隨時安全地刪除緩存文件夾，下次加載模型將再次下載該模型。

您可以通過在加載NeuralCoref之前激活Python的logging模塊，以獲取有關內部模型的位置，下載和緩存過程的更多信息：

 import logging ;
logging . basicConfig ( level = logging . INFO )
import neuralcoref
> >> INFO : neuralcoref : Getting model from https : // s3 . amazonaws . com / models . huggingface . co / neuralcoref / neuralcoref . tar . gz or cache
> >> INFO : neuralcoref . file_utils : https : // s3 . amazonaws . com / models . huggingface . co / neuralcoref / neuralcoref . tar . gz not found in cache , downloading to / var / folders / yx / cw8n_njx3js5jksyw_qlp8p00000gn / T / tmp_8y5_52m
100 % | █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 40155833 / 40155833 [ 00 : 06 < 00 : 00 , 6679263.76 B / s ]
> >> INFO : neuralcoref . file_utils : copying / var / folders / yx / cw8n_njx3js5jksyw_qlp8p00000gn / T / tmp_8y5_52m to cache at / Users / thomaswolf / . neuralcoref_cache / f46bc05a4bfba2ae0d11ffd41c4777683fa78ed357dc04a23c67137abf675e14 . 7 d6f9a6fecf5cf09e74b65f85c7d6896b21decadb2554d486474f63b95ec4633
> >> INFO : neuralcoref . file_utils : creating metadata file for / Users / thomaswolf / . neuralcoref_cache / f46bc05a4bfba2ae0d11ffd41c4777683fa78ed357dc04a23c67137abf675e14 . 7 d6f9a6fecf5cf09e74b65f85c7d6896b21decadb2554d486474f63b95ec4633
>> > INFO : neuralcoref . file_utils : removing temp file / var / folders / yx / cw8n_njx3js5jksyw_qlp8p00000gn / T / tmp_8y5_52m
>> > INFO : neuralcoref : extracting archive file / Users / thomaswolf / . neuralcoref_cache / f46bc05a4bfba2ae0d11ffd41c4777683fa78ed357dc04a23c67137abf675e14 . 7 d6f9a6fecf5cf09e74b65f85c7d6896b21decadb2554d486474f63b95ec4633 to dir / Users / thomaswolf / . neuralcoref_cache / neuralcoref

加載神經孔

將NeuralCoref添加到英語尖峰語言的管道中

這是實例化NeuralCoref並將其添加到Spacy註釋管道中的推薦方法：

 # Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy . load ( 'en' )

# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref . add_to_pipe ( nlp )

# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp ( u'My sister has a dog. She loves him.' )

doc . _ . has_coref
doc . _ . coref_clusters

加載NeuralCoref並手動將其添加到英語尖頂語言的管道中

將NeuralCoref添加到Spacy模型管道中的一種等效方法是首先實例化神經科類，然後手動將其添加到Spacy語言模型的管道中。

 # Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy . load ( 'en' )

# load NeuralCoref and add it to the pipe of SpaCy's model
import neuralcoref
coref = neuralcoref . NeuralCoref ( nlp . vocab )
nlp . add_pipe ( coref , name = 'neuralcoref' )

# You're done. You can now use NeuralCoref the same way you usually manipulate a SpaCy document and it's annotations.
doc = nlp ( u'My sister has a dog. She loves him.' )

doc . _ . has_coref
doc . _ . coref_clusters

使用NeuralCoref

NeuralCoref將解決核心發揮，並將其註釋為._下的Spacy Doc ， Span和Token對像中的擴展屬性._.字典。

這是註釋的列表：

屬性	類型	描述
`doc._.has_coref`	布爾	是否有任何核心已在文檔中得到解決
`doc._.coref_clusters`	`Cluster`列表	所有核心方面的群集都提到了文檔
`doc._.coref_resolved`	Unicode	DOC的Unicode表示，其中每個核心提及被關聯集群中的主要提及代替。
`doc._.coref_scores`	dict	提及之間的核心分辨率分數。
`span._.is_coref`	布爾	跨度是否至少有一個核心提及
`span._.coref_cluster`	`Cluster`	群集提到了與跨度的核心
`span._.coref_scores`	dict	與其他提及（如果適用）的核心分辨率分辨率分數。
`token._.in_coref`	布爾	令牌是否在內部至少一個核心提及
`token._.coref_clusters`	`Cluster`列表	所有核心提及的群集包含令牌

Cluster是一大批核心提及，它具有3個屬性和一些簡化集群中導航的方法：

屬性或方法	類型 /返回類型	描述
`i`	int	DOC中集群的索引
`main`	`Span`	集群中最具代表性的跨度
`mentions`	`Span`列表	集群中所有提及的列表
`__getitem__`	返回`Span`	在集群中訪問
`__iter__`	產量`Span`	迭代集群中提到
`__len__`	返回int	集群中的提及數

導航核心群集鏈

您還可以輕鬆地瀏覽Coreference群集鏈，並顯示簇和提及。

這裡有一些示例，請嘗試自己測試。

 import spacy
import neuralcoref
nlp = spacy . load ( 'en' )
neuralcoref . add_to_pipe ( nlp )

doc = nlp ( u'My sister has a dog. She loves him' )

doc . _ . coref_clusters
doc . _ . coref_clusters [ 1 ]. mentions
doc . _ . coref_clusters [ 1 ]. mentions [ - 1 ]
doc . _ . coref_clusters [ 1 ]. mentions [ - 1 ]. _ . coref_cluster . main

token = doc [ - 1 ]
token . _ . in_coref
token . _ . coref_clusters

span = doc [ - 1 :]
span . _ . is_coref
span . _ . coref_cluster . main
span . _ . coref_cluster . main . _ . coref_cluster

重要的是：NeuralCoref提及是Spacy Spen對象，這意味著您可以訪問所有常規跨度屬性，例如span.start （文檔中第一個跨度的索引的索引）， span.end （文檔中第一個令牌的索引）等...

例如： doc._.coref_clusters[1].mentions[-1].start將為您提供文檔中第二個核心群集的最後一個標記的索引。

參數

您可以將幾個其他參數傳遞到neuralcoref.add_to_pipe或NeuralCoref()以控制NeuralCoref的行為。

這是這些參數及其描述的完整列表：

範圍	類型	描述
`greedyness`	漂浮	0到1之間的數字確定模型對做出核心決策的貪婪程度（更貪婪意味著更多的核心鏈接）。默認值為0.5。
`max_dist`	int	在考慮當前提及的可能的先例時，有多少人提到。降低值會導致系統運行速度更快，但準確程度較差。默認值為50。
`max_dist_match`	int	如果系統共享名詞或專有名詞，則該系統將考慮將當前提及與前面的提及`max_dist` 。在這種情況下，它看起來是`max_dist_match` and。默認值為500。
`blacklist`	布爾	系統是否應該解決以下列表中代詞的核心發揮： `["i", "me", "my", "you", "your"]` 。默認值為真（COREFERY已解決）。
`store_scores`	布爾	該系統是否應該在註釋中存儲分數的分數。默認值是正確的。
`conv_dict`	dict（str，列表（str））	您可以使用的轉換字典將稀有單詞（鍵）的嵌入到平均值（值）列表的平均值。例如： `conv_dict={"Angela": ["woman", "girl"]}`將通過使用更常見的`woman`和`girl`的嵌入而不是`Angela`的嵌入來幫助解決`Angela`的核心發揮。目前，這僅適用於單個單詞（不適合單詞組）。

如何更改參數

 import spacy
import neuralcoref

# Let's load a SpaCy model
nlp = spacy . load ( 'en' )

# First way we can control a parameter
neuralcoref . add_to_pipe ( nlp , greedyness = 0.75 )

# Another way we can control a parameter
nlp . remove_pipe ( "neuralcoref" )  # This remove the current neuralcoref instance from SpaCy pipe
coref = neuralcoref . NeuralCoref ( nlp . vocab , greedyness = 0.75 )
nlp . add_pipe ( coref , name = 'neuralcoref' )

使用轉換字典參數來幫助解決稀有單詞

這是一個示例，說明我們如何使用參數conv_dict來幫助解決一個罕見詞的核心發作，例如名稱：

 import spacy
import neuralcoref

nlp = spacy . load ( 'en' )

# Let's try before using the conversion dictionary:
neuralcoref . add_to_pipe ( nlp )
doc = nlp ( u'Deepika has a dog. She loves him. The movie star has always been fond of animals' )
doc . _ . coref_clusters
doc . _ . coref_resolved
# >>> [Deepika: [Deepika, She, him, The movie star]]
# >>> 'Deepika has a dog. Deepika loves Deepika. Deepika has always been fond of animals'
# >>> Not very good...

# Here are three ways we can add the conversion dictionary
nlp . remove_pipe ( "neuralcoref" )
neuralcoref . add_to_pipe ( nlp , conv_dict = { 'Deepika' : [ 'woman' , 'actress' ]})
# or
nlp . remove_pipe ( "neuralcoref" )
coref = neuralcoref . NeuralCoref ( nlp . vocab , conv_dict = { 'Deepika' : [ 'woman' , 'actress' ]})
nlp . add_pipe ( coref , name = 'neuralcoref' )
# or after NeuralCoref is already in SpaCy's pipe, by modifying NeuralCoref in the pipeline
nlp . get_pipe ( 'neuralcoref' ). set_conv_dict ({ 'Deepika' : [ 'woman' , 'actress' ]})

# Let's try agin with the conversion dictionary:
doc = nlp ( u'Deepika has a dog. She loves him. The movie star has always been fond of animals' )
doc . _ . coref_clusters
# >>> [Deepika: [Deepika, She, The movie star], a dog: [a dog, him]]
# >>> 'Deepika has a dog. Deepika loves a dog. Deepika has always been fond of animals'
# >>> A lot better!

使用NeuralCoref作為服務器

examples/server.py中提供了用於在REST API中集成NeuralCoreF的服務器腳本的簡單示例。

要使用它，您需要先安裝獵鷹：

pip install falcon

然後，您可以按以下方式啟動服務器：

 cd examples
python ./server.py

並查詢這樣的服務器：

curl --data-urlencode " text=My sister has a dog. She loves him. " -G localhost:8000

您還有許多其他方法可以管理和部署NeuralCoref。在Spacy Universe中可以找到一些示例。

重新培訓模型 /擴展到另一種語言

如果您想重新訓練該模型或使用其他語言進行訓練，請參閱我們的培訓說明以及我們的博客文章

展開

附加信息

版本 v4.0.0: Simpler installation and integration with SpaCy 2.1+, simple domain-knowledge integration
類型其他源碼
更新時間 2025-04-16
大小 66.77MB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部

neuralcoref