تنزيل zshot - تنزيل رمز المصدر zshot

zshot

شفرة المصدر الأخرى

v0.0.9

تنزيل

Zshot

صفر وعدد قليل من اللقطات المسمى كيان وعلاقات الاعتراف

يبني

الوثائق : https://ibm.github.io/zshot

رمز المصدر : https://github.com/ibm/zshot

ورقة : https://aclanthology.org/2023.acl-demo.34/

Zshot هو إطار قابل للتخصيص للغاية لأداء صفر وعدد قليل من اللقطة المسمى بالكيان.

يمكن استخدامها لأداء:

الإشارة إلى استخراج : تحديد الإشارات ذات الصلة عالميًا أو الإشارات ذات الصلة بمجال معين
Wikification : مهمة ربط الإشارات النصية بالكيانات في ويكيبيديا
صفر وعدد قليل من اللقطات المسمى التعرف على الكيان : استخدام الوصف اللغوي أداء ner للتعميم على المجالات غير المرئية
صفر وعدد قليل من اللقطة المسمى بالتعرف على العلاقة
التصور: صفر طلقة نير وإعادة الاستخراج

متطلبات

Python 3.6+
spacy - تعتمد ZSHOT على Spacy لخطوط الأنابيب والتصور
torch - Pytorch مطلوب لتشغيل نماذج Pytorch.
transformers - مطلوبة لنماذج اللغة التي تم تدريبها مسبقًا.
evaluate - مطلوب للتقييم.
datasets - مطلوب لتقييم مجموعات البيانات (على سبيل المثال: Ontonotes).

تبعيات اختيارية

flair - مطلوب إذا كنت ترغب في استخدام Flair Ambertive Extractor و Tars Linker و Tars يذكر المستخرج.
blink - مطلوب إذا كنت ترغب في استخدام Blink لربط صفحات ويكيبيديا.
gliner - مطلوب إذا كنت تريد استخدام Gliner Linker أو Gliner المذكر.

تثبيت

$ pip install zshot

---> 100%

أمثلة

مثال	دفتر
التثبيت والتصور
مستخرج المعرفة
ويكي
مكونات مخصصة
تقييم

نهج Zshot

يحتوي Zshot على مكونين مختلفين ، مستخرج الإشارات والرابط .

يذكر المستخرج

سوف يكتشف مستخرج الإشارات الكيانات الممكنة (المعروفة أيضًا باسم الإشارات) ، والتي سيتم ربطها بعد ذلك بمصدر بيانات (على سبيل المثال: Wikidata) بواسطة الرابط .

حاليًا ، هناك 7 مستخلصات مختلفة مدعومة ، SMXM ، Tars ، Gliner ، 2 على أساس Spacy ، و 2 يعتمد على الذوق . يتشابه الإصداران المختلفان لـ Spacy و Flair ، ويستند أحدهما على التعرف على الكيان المسماة (NERC) والآخر يعتمد على اللغويات (أي: استخدام جزء من علامات الكلام (POS) وحلية التبعية (DP)).

سيستخدم نهج NERC نماذج NERC للكشف عن جميع الكيانات التي يجب ربطها. يعتمد هذا النهج على النموذج الذي يتم استخدامه ، والكيانات التي تم تدريب النموذج عليها ، لذلك اعتمادًا على حالة الاستخدام والكيانات المستهدفة ، قد لا تكون أفضل طريقة ، حيث قد لا يتم التعرف على الكيانات بواسطة نموذج NERC وبالتالي لن يتم ربطها.

يعتمد النهج اللغوي على فكرة أن الإشارة عادة ما تكون عبارة عن مجموعة أو اسم. لذلك ، يكتشف هذا النهج الأسماء المدرجة في syntagma والتي تتصرف مثل الكائنات والموضوعات ، وما إلى ذلك. لا يعتمد هذا النهج على النموذج (على الرغم من أن الأداء يحدث) ، ولكن يجب أن يكون الاسم في النص دائمًا اسمًا ، فهو لا يعتمد على مجموعة البيانات التي تم تدريبها على النموذج.

رابط

سوف يربط الرابط الكيانات المكتشفة بمجموعة موجودة من الملصقات. ومع ذلك ، فإن بعض الروابط من النهاية إلى النهاية ، أي أنها لا تحتاج إلى مستخلص الإشارات ، لأنها تكتشف الكيانات وربطها في نفس الوقت.

مرة أخرى ، هناك 5 روابط متوفرة حاليًا ، 3 منها من طرف إلى طرف و 2 ليست كذلك.

اسم الرابط	من طرف إلى طرف	رمز المصدر	ورق
وميض	x	رمز المصدر	ورق
النوع	x	رمز المصدر	ورق
SMXM	✓	رمز المصدر	ورق
القطران	✓	رمز المصدر	ورق
غلينر	✓	رمز المصدر	ورق

مستخرج العلاقات

سيقوم مستخرج العلاقات باستخراج العلاقات بين الكيانات المختلفة التي تم استخلاصها مسبقًا بواسطة رابط ..

حاليا ، هو مستخرج علاقة واحدة فقط:

ZS-bert
- ورق
- رمز المصدر

مستخرج المعرفة

سوف يؤدي مستخرج المعرفة في نفس الوقت استخراج وتصنيف الكيانات المسماة واستخراج العلاقات بينها. لا يحتاج خط الأنابيب مع هذا المكون إلى أي مستخلص أو رابط أو مستخرج من العلاقة للعمل.

حاليا ، هو مستخرج المعرفة واحد فقط متاح:

knowgl
- روسييلو وآخرون. (AAAI 2023)
- Mihindukulasooriya et al. (ISWC 2022)

كيفية استخدامه

التثبيت متطلبات: pip install -r requirements.txt
قم بتثبيت خط أنابيب Spacy لاستخدامه لاستخراج الإشارات: python -m spacy download en_core_web_sm
قم بإنشاء ملف main.py مع تكوين خط الأنابيب وتعريف الكيانات ( عادةً ما تكون Wikipedia Abstract نقطة انطلاق جيدة للأوصاف ):

 import spacy

from zshot import PipelineConfig , displacy
from zshot . linker import LinkerRegen
from zshot . mentions_extractor import MentionsExtractorSpacy
from zshot . utils . data_models import Entity

nlp = spacy . load ( "en_core_web_sm" )
nlp_config = PipelineConfig (
    mentions_extractor = MentionsExtractorSpacy (),
    linker = LinkerRegen (),
    entities = [
        Entity ( name = "Paris" ,
               description = "Paris is located in northern central France, in a north-bending arc of the river Seine" ),
        Entity ( name = "IBM" ,
               description = "International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York" ),
        Entity ( name = "New York" , description = "New York is a city in U.S. state" ),
        Entity ( name = "Florida" , description = "southeasternmost U.S. state" ),
        Entity ( name = "American" ,
               description = "American, something of, from, or related to the United States of America, commonly known as the United States or America" ),
        Entity ( name = "Chemical formula" ,
               description = "In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule" ),
        Entity ( name = "Acetamide" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
        Entity ( name = "Armonk" ,
               description = "Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States." ),
        Entity ( name = "Acetic Acid" ,
               description = "Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH" ),
        Entity ( name = "Industrial solvent" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
    ]
)
nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

text = "International Business Machines Corporation (IBM) is an American multinational technology corporation" 
       " headquartered in Armonk, New York, with operations in over 171 countries."

doc = nlp ( text )
displacy . serve ( doc , style = "ent" )

قم بتشغيله

الجري مع

$ python main.py

Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

سيقوم البرنامج النصي بتعليق النص باستخدام Zshot ويستخدم الإزاحة لتصور التعليقات التوضيحية

تحقق من ذلك

افتح متصفحك على http://127.0.0.1:5000.

سترى الجملة المشروحة:

كيفية إنشاء مكون مخصص

إذا كنت ترغب في تنفيذ Amply_extractor أو Linker واستخدامها مع Zshot ، يمكنك القيام بذلك. لتسهيل على المستخدم تنفيذ مكون جديد ، يتم توفير بعض الفئات الأساسية لتوسيعك مع الكود الخاص بك.

الأمر بسيط مثل إنشاء فئة جديدة تمدد الفئة الأساسية ( MentionsExtractor أو Linker ). سيتعين عليك تنفيذ طريقة التنبؤ ، والتي ستتلقى مستندات Spacy وسوف تقوم بإرجاع قائمة zshot.utils.data_models.Span لكل مستند.

هذا أمر بسيط يذكر _extractor الذي سوف يستخرج كما يذكر جميع الكلمات التي تحتوي على الحرف s:

 from typing import Iterable
import spacy
from spacy . tokens import Doc
from zshot import PipelineConfig
from zshot . utils . data_models import Span
from zshot . mentions_extractor import MentionsExtractor

class SimpleMentionExtractor ( MentionsExtractor ):
    def predict ( self , docs : Iterable [ Doc ], batch_size = None ):
        spans = [[ Span ( tok . idx , tok . idx + len ( tok )) for tok in doc if "s" in tok . text ] for doc in docs ]
        return spans

new_nlp = spacy . load ( "en_core_web_sm" )

config = PipelineConfig (
    mentions_extractor = SimpleMentionExtractor ()
)
new_nlp . add_pipe ( "zshot" , config = config , last = True )
text_acetamide = "CH2O2 is a chemical compound similar to Acetamide used in International Business " 
        "Machines Corporation (IBM)."

doc = new_nlp ( text_acetamide )
print ( doc . _ . mentions )

> >> [ is , similar , used , Business , Machines , materials ]

كيفية تقييم Zshot

يعد التقييم عملية مهمة للحفاظ على تحسين أداء النماذج ، ولهذا السبب يسمح Zshot بتقييم المكون مع مجموعتين من البيانات المحددة مسبقًا: ontonotes و medmentions ، في إصدار صفري لا تظهر فيه كيانات الاختبار والتحقق من الصحة في مجموعة القطار.

يحتوي evaluation الحزمة على جميع الوظائف لتقييم مكونات ZSHOT. الوظيفة الرئيسية هي zshot.evaluation.zshot_evaluate.evaluate ، والتي سوف تأخذ كإدخال نموذج SPACY nlp ومجموعة البيانات لتقييمها. سيعود إلى str يحتوي على جدول مع نتائج التقييم. على سبيل المثال ، سيكون تقييم رابط Tars في Zshot لمجموعة التحقق من صحة Ontonotes :

 import spacy

from zshot import PipelineConfig
from zshot . linker import LinkerTARS
from zshot . evaluation . dataset import load_ontonotes_zs
from zshot . evaluation . zshot_evaluate import evaluate , prettify_evaluate_report
from zshot . evaluation . metrics . seqeval . seqeval import Seqeval

ontonotes_zs = load_ontonotes_zs ( 'validation' )


nlp = spacy . blank ( "en" )
nlp_config = PipelineConfig (
    linker = LinkerTARS (),
    entities = ontonotes_zs . entities
)

nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

evaluation = evaluate ( nlp , ontonotes_zs , metric = Seqeval ())
prettify_evaluate_report ( evaluation )

اقتباس

 @inproceedings{picco-etal-2023-zshot,
    title = "Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction",
    author = "Picco, Gabriele  and
      Martinez Galindo, Marcos  and
      Purpura, Alberto  and
      Fuchs, Leopold  and
      Lopez, Vanessa  and
      Hoang, Thanh Lam",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-demo.34",
    doi = "10.18653/v1/2023.acl-demo.34",
    pages = "357--368",
    abstract = "The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.",
}

يوسع

معلومات إضافية

الإصدار v0.0.9
النوع شفرة المصدر الأخرى
وقت التحديث 2025-04-18
الحجم 432.72KB
من Github

تطبيقات ذات صلة

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
Google Dorks

شفرة المصدر الأخرى

1.0
shepherd

شفرة المصدر الأخرى

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

شفرة المصدر الأخرى

v1.1.0-rc-3
Google Dorks

شفرة المصدر الأخرى

1.0
shepherd

شفرة المصدر الأخرى

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

شفرة المصدر الأخرى

v1.1.0-rc-3

أخبار ذات صلة الكل