zshotダウンロードzshotソースコードのダウンロード

zshot

その他のソースコード

v0.0.9

ダウンロード

Zshot

ゼロと少数のショットという名前のエンティティと関係の認識

建てる

ドキュメント：https：//ibm.github.io/zshot

ソースコード：https：//github.com/ibm/zshot

論文：https：//aclanthology.org/2023.acl-demo.34/

Zshotは、ゼロを実行するための高度にカスタマイズ可能なフレームワークであり、エンティティ認識という名前のショットはほとんどありません。

実行するために使用できます。

言及抽出：特定のドメインに関連する世界的に関連する言及または言及を特定する
Wikification ：ウィキペディアのエンティティにテキスト言及をリンクするタスク
ゼロと少数のショット名前付きエンティティ認識：言語の説明を使用して、目に見えないドメインに一般化するためにnerを実行します
ゼロといくつかのショットと名付けられた関係認識
視覚化：ゼロショットNERとRE抽出

要件

Python 3.6+
spacy -Zshotはパイプラインと視覚化のためにスペイシーに依存しています
torch -Pytorchモデルを実行するにはPytorchが必要です。
transformers - 事前に訓練された言語モデルに必要です。
evaluate - 評価に必要です。
datasets - データセットを介して評価する必要があります（例：ontonotes）。

オプションの依存関係

flair炎を使用する場合は必要です抽出器とTARSリンカーとTARSに言及する場合は抽出器に言及します。
blink -WikipediaページにリンクするためにBlinkを使用する場合は必要です。
gliner -Glinerリンカーを使用する場合、またはGlinerが抽出器に言及する場合は必要です。

インストール

$ pip install zshot

---> 100%

例

例	ノート
インストールと視覚化
知識抽出器
Wikification
カスタムコンポーネント
評価

Zshotアプローチ

Zshotには、言及抽出器とリンカーの2つの異なるコンポーネントが含まれています。

抽出器に言及します

言及抽出器は、可能なエンティティ（別名言及）を検出し、それはリンカーによってデータソース（例：wikidata）にリンクされます。

現在、サポートされている7つの異なる言及抽出器、SMXM、TARS、Gliner、2つのスパシーに基づいて2つの言及があり、2つはFLAIRに基づいています。 SpacyとFlairの2つの異なるバージョンは類似しており、1つは名前付きエンティティ認識と分類（NERC）に基づいており、もう1つは言語学に基づいています（つまり、音声タグ付け（POS）と依存関係解析（DP）の一部を使用）。

NERCアプローチでは、NERCモデルを使用して、リンクする必要があるすべてのエンティティを検出します。このアプローチは、使用されているモデルに依存し、モデルのエンティティがトレーニングされているため、ユースケースとターゲットエンティティに応じて、エンティティはNERCモデルによって認識されず、したがってリンクされないため、最良のアプローチではない可能性があります。

言語的アプローチは、言及されることは通常、シンタグマまたは名詞になるという考えに依存しています。したがって、このアプローチは、シンターマに含まれ、オブジェクト、サブジェクトなどのように機能する名詞を検出します。このアプローチはモデルに依存するものではありません（パフォーマンスはそうではありませんが）が、テキストの名詞は常に名詞であるはずです。モデルがトレーニングされているデータセットに依存しません。

リンカ

リンカーは、検出されたエンティティを既存のラベルセットにリンクします。ただし、リンカーの一部はエンドツーエンドです。つまり、エンティティを同時に検出およびリンクするため、言及抽出器は必要ありません。

繰り返しますが、現在5つのリンカーがあり、そのうち3つはエンドツーエンド、2つはそうではありません。

リンカー名	エンドツーエンド	ソースコード	紙
まばたき	x	ソースコード	紙
ジャンル	x	ソースコード	紙
SMXM	✓✓	ソースコード	紙
タール	✓✓	ソースコード	紙
グライナー	✓✓	ソースコード	紙

関係抽出器

関係抽出器は、以前にリンカーによって抽出されたさまざまなエンティティ間の関係を抽出します。

現在、利用可能な1つの関係抽出器のみです。

zs-bert
- 紙
- ソースコード

知識抽出器

知識抽出器は、同時に、指定されたエンティティの抽出と分類、およびそれらの間の関係の抽出を実行します。このコンポーネントを備えたパイプラインでは、作業するために抽出器、リンカー、または関係抽出器の言及は必要ありません。

現在、利用可能な知識抽出器は1つだけです。

Knowgl
- Rossiello et al。（AAAI 2023）
- Mihindukulasooriya et al。（ISWC 2022）

それを使用する方法

インストール要件： pip install -r requirements.txt
スパシーパイプラインをインストールして抽出に使用するために使用： python -m spacy download en_core_web_sm
パイプライン構成とエンティティ定義を使用してファイルmain.pyを作成します（ウィキペディア要約は通常、説明の良い出発点です）：

 import spacy

from zshot import PipelineConfig , displacy
from zshot . linker import LinkerRegen
from zshot . mentions_extractor import MentionsExtractorSpacy
from zshot . utils . data_models import Entity

nlp = spacy . load ( "en_core_web_sm" )
nlp_config = PipelineConfig (
    mentions_extractor = MentionsExtractorSpacy (),
    linker = LinkerRegen (),
    entities = [
        Entity ( name = "Paris" ,
               description = "Paris is located in northern central France, in a north-bending arc of the river Seine" ),
        Entity ( name = "IBM" ,
               description = "International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York" ),
        Entity ( name = "New York" , description = "New York is a city in U.S. state" ),
        Entity ( name = "Florida" , description = "southeasternmost U.S. state" ),
        Entity ( name = "American" ,
               description = "American, something of, from, or related to the United States of America, commonly known as the United States or America" ),
        Entity ( name = "Chemical formula" ,
               description = "In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule" ),
        Entity ( name = "Acetamide" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
        Entity ( name = "Armonk" ,
               description = "Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States." ),
        Entity ( name = "Acetic Acid" ,
               description = "Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH" ),
        Entity ( name = "Industrial solvent" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
    ]
)
nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

text = "International Business Machines Corporation (IBM) is an American multinational technology corporation" 
       " headquartered in Armonk, New York, with operations in over 171 countries."

doc = nlp ( text )
displacy . serve ( doc , style = "ent" )

それを実行します

一緒に走ります

$ python main.py

Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

スクリプトはZshotを使用してテキストに注釈を付け、ディスプレンを使用して注釈を視覚化する

それを確認してください

http://127.0.0.1:5000でブラウザを開きます。

注釈付き文が表示されます。

カスタムコンポーネントを作成する方法

独自のMentions_extractorまたはリンカーを実装し、zshotで使用する場合は、実行できます。ユーザーが新しいコンポーネントを簡単に実装できるようにするには、コードを拡張する必要がある基本クラスが提供されます。

ベースクラス（ MentionsExtractorまたはLinker ）を拡張する新しいクラスを作成するのと同じくらい簡単です。 Spacyドキュメントを受信し、各ドキュメントのzshot.utils.data_models.Spanのリストを返す予測メソッドを実装する必要があります。

これは、文字sを含むすべての単語に言及するように抽出する単純なdementions_extractorです。

 from typing import Iterable
import spacy
from spacy . tokens import Doc
from zshot import PipelineConfig
from zshot . utils . data_models import Span
from zshot . mentions_extractor import MentionsExtractor

class SimpleMentionExtractor ( MentionsExtractor ):
    def predict ( self , docs : Iterable [ Doc ], batch_size = None ):
        spans = [[ Span ( tok . idx , tok . idx + len ( tok )) for tok in doc if "s" in tok . text ] for doc in docs ]
        return spans

new_nlp = spacy . load ( "en_core_web_sm" )

config = PipelineConfig (
    mentions_extractor = SimpleMentionExtractor ()
)
new_nlp . add_pipe ( "zshot" , config = config , last = True )
text_acetamide = "CH2O2 is a chemical compound similar to Acetamide used in International Business " 
        "Machines Corporation (IBM)."

doc = new_nlp ( text_acetamide )
print ( doc . _ . mentions )

> >> [ is , similar , used , Business , Machines , materials ]

Zshotを評価する方法

評価は、モデルのパフォーマンスを改善し続けるための重要なプロセスです。そのため、Zshotは、テストと検証のエンティティがトレインセットに表示されないゼロショットバージョンで、2つの事前定義されたデータセットでコンポーネントを評価できます。

パッケージevaluation 、Zshotコンポーネントを評価するためのすべての機能が含まれています。主な関数はzshot.evaluation.zshot_evaluate.evaluateです。これは、Spacy nlpモデルと評価するデータセットを入力します。評価の結果を含むテーブルを含むstrを返します。たとえば、Ontonotes検証セットのZshotでのTARSリンカーの評価は次のとおりです。

 import spacy

from zshot import PipelineConfig
from zshot . linker import LinkerTARS
from zshot . evaluation . dataset import load_ontonotes_zs
from zshot . evaluation . zshot_evaluate import evaluate , prettify_evaluate_report
from zshot . evaluation . metrics . seqeval . seqeval import Seqeval

ontonotes_zs = load_ontonotes_zs ( 'validation' )


nlp = spacy . blank ( "en" )
nlp_config = PipelineConfig (
    linker = LinkerTARS (),
    entities = ontonotes_zs . entities
)

nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

evaluation = evaluate ( nlp , ontonotes_zs , metric = Seqeval ())
prettify_evaluate_report ( evaluation )

引用

 @inproceedings{picco-etal-2023-zshot,
    title = "Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction",
    author = "Picco, Gabriele  and
      Martinez Galindo, Marcos  and
      Purpura, Alberto  and
      Fuchs, Leopold  and
      Lopez, Vanessa  and
      Hoang, Thanh Lam",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-demo.34",
    doi = "10.18653/v1/2023.acl-demo.34",
    pages = "357--368",
    abstract = "The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.",
}

拡大する

追加情報

バージョン v0.0.9
タイプその他のソースコード
更新時間 2025-04-18
サイズ 432.72KB
から Github

zshot

Zshot

要件

オプションの依存関係

インストール

例

Zshotアプローチ

抽出器に言及します

リンカ

関係抽出器

知識抽出器

それを使用する方法

それを実行します

それを確認してください

カスタムコンポーネントを作成する方法

Zshotを評価する方法

引用

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express