zshot ดาวน์โหลด - ดาวน์โหลดซอร์สโค้ด zshot

zshot

ซอร์สโค้ดอื่น ๆ

v0.0.9

ดาวน์โหลด

zshot

ศูนย์และไม่กี่นัดชื่อเอนทิตีและการจดจำความสัมพันธ์

สร้าง

เอกสาร : https://ibm.github.io/zshot

ซอร์สโค้ด : https://github.com/ibm/zshot

กระดาษ : https://aclanthology.org/2023.acl-demo.34/

ZSHOT เป็นกรอบการทำงานที่ปรับแต่งได้สูงสำหรับการดำเนินการเป็นศูนย์และไม่กี่นัดชื่อการจดจำเอนทิตี

สามารถใช้ในการดำเนินการ:

การกล่าวถึงการสกัด : ระบุการกล่าวถึงหรือการกล่าวถึงที่เกี่ยวข้องทั่วโลกที่เกี่ยวข้องกับโดเมนที่กำหนด
Wikification : ภารกิจของการเชื่อมโยงข้อความที่กล่าวถึงหน่วยงานในวิกิพีเดีย
ศูนย์และไม่กี่นัดชื่อการจดจำเอนทิตี : การใช้คำอธิบายภาษาดำเนินการ ner เพื่อทั่วไปไปยังโดเมนที่มองไม่เห็น
ศูนย์และไม่กี่นัดชื่อการจดจำความสัมพันธ์
การสร้างภาพ: การสกัดแบบ zero-shot ner และ re re

ความต้องการ

Python 3.6+
spacy - Zshot พึ่งพา spacy สำหรับการจัดท่อและการสร้างภาพ
torch - Pytorch จำเป็นต้องใช้โมเดล Pytorch
transformers - จำเป็นสำหรับแบบจำลองภาษาที่ผ่านการฝึกอบรมมาก่อน
evaluate - จำเป็นสำหรับการประเมินผล
datasets - จำเป็นต้องประเมินชุดข้อมูล (เช่น ontonotes)

การพึ่งพาตัวเลือก

flair - จำเป็นถ้าคุณต้องการใช้ Flair กล่าวถึงตัวแยกและสำหรับ tars linker และ tars กล่าวถึงตัวแยก
blink - จำเป็นถ้าคุณต้องการใช้ Blink สำหรับการเชื่อมโยงไปยังหน้า Wikipedia
gliner - จำเป็นต้องใช้ถ้าคุณต้องการใช้ Gliner Linker หรือ Gliner กล่าวถึง Extractor

การติดตั้ง

$ pip install zshot

---> 100%

ตัวอย่าง

ตัวอย่าง	สมุดบันทึก
การติดตั้งและการสร้างภาพ
เครื่องสกัดความรู้
การทำให้วิปริต
ส่วนประกอบที่กำหนดเอง
การประเมิน

วิธีการ zshot

Zshot มีสององค์ประกอบที่แตกต่างกันคือ ตัวแยก และ ตัวเชื่อมโยง

กล่าวถึงเครื่องสกัด

ตัวแยกการกล่าวถึง จะตรวจจับเอนทิตีที่เป็นไปได้ (AKA กล่าวถึง) ซึ่งจะเชื่อมโยงกับแหล่งข้อมูล (เช่น: Wikidata) โดย Linker

ปัจจุบันมี การกล่าวถึงสารสกัด ที่แตกต่างกัน 7 รายการ, SMXM, Tars, Gliner, 2 ตาม Spacy และ 2 ที่ขึ้นอยู่กับ ไหวพริบ สองเวอร์ชันที่แตกต่างกันสำหรับ spacy และ flair นั้นคล้ายกันอีกรุ่นหนึ่งขึ้นอยู่กับการรับรู้เอนทิตีที่มีชื่อและการจำแนกประเภท (NERC) และอีกรุ่นหนึ่งขึ้นอยู่กับภาษาศาสตร์ (เช่น: การใช้ส่วนหนึ่งของการติดแท็กคำพูด (POS) และการแยกวิเคราะห์การพึ่งพา (DP))

วิธีการ NERC จะใช้โมเดล NERC เพื่อตรวจจับเอนทิตีทั้งหมดที่ต้องเชื่อมโยง วิธีการนี้ขึ้นอยู่กับแบบจำลองที่ใช้งานและเอนทิตีของแบบจำลองได้รับการฝึกอบรมดังนั้นขึ้นอยู่กับกรณีการใช้งานและหน่วยงานเป้าหมายอาจไม่ใช่วิธีที่ดีที่สุดเนื่องจากเอนทิตีอาจไม่ได้รับการยอมรับจากโมเดล NERC และจะไม่เชื่อมโยง

วิธีการทางภาษาขึ้นอยู่กับความคิดที่ว่าการกล่าวถึงมักจะเป็น syntagma หรือคำนาม ดังนั้นวิธีการนี้ตรวจพบคำนามที่รวมอยู่ใน syntagma และทำหน้าที่เหมือนวัตถุอาสาสมัคร ฯลฯ วิธีการนี้ไม่ได้ขึ้นอยู่กับโมเดล (แม้ว่าประสิทธิภาพจะทำ) แต่คำนามในข้อความควรเป็นคำนามเสมอมันไม่ได้ขึ้นอยู่กับชุดข้อมูลที่ได้รับการฝึกฝน

ผู้เชื่อมโยง

Linker จะเชื่อมโยงเอนทิตีที่ตรวจพบกับชุดป้ายกำกับที่มีอยู่ อย่างไรก็ตาม linkers บางตัวเป็น แบบ end-to-end นั่นคือพวกเขาไม่จำเป็นต้องมี การกล่าวถึงตัวแยก เนื่องจากพวกเขาตรวจจับและเชื่อมโยงเอนทิตีในเวลาเดียวกัน

อีกครั้งมี linkers 5 ตัวในปัจจุบัน 3 ของพวกเขาเป็น end-to-end และ 2 ไม่ได้

ชื่อตัวเชื่อมโยง	end-end-end	รหัสต้นฉบับ	กระดาษ
กระพริบตา	x	รหัสต้นฉบับ	กระดาษ
ประเภท	x	รหัสต้นฉบับ	กระดาษ
SMXM		รหัสต้นฉบับ	กระดาษ
ตาล		รหัสต้นฉบับ	กระดาษ
คนขี้เกียจ		รหัสต้นฉบับ	กระดาษ

เครื่องสกัดสัมพันธ์

ตัวแยกความสัมพันธ์ จะสกัดความสัมพันธ์ระหว่างหน่วยงานต่าง ๆ ที่ สกัดโดย linker ..

ปัจจุบันมีตัวแยกความสัมพันธ์เพียงตัวเดียวเท่านั้น:

ZS-BERT
- กระดาษ
- รหัสต้นฉบับ

เครื่องสกัดความรู้

ตัวแยกความรู้ จะดำเนินการในเวลาเดียวกันการสกัดและการจำแนกประเภทของเอนทิตีที่มีชื่อและการสกัดความสัมพันธ์ระหว่างพวกเขา ไปป์ไลน์ที่มีส่วนประกอบนี้ไม่จำเป็นต้อง มีการกล่าวถึงตัวแยก , linker หรือ ตัวแยกความสัมพันธ์ ในการทำงาน

ปัจจุบันมีตัวแยกความรู้เพียงตัวเดียวเท่านั้น:

ความรู้
- Rossiello และคณะ (Aaai 2023)
- Mihindukulasooriya และคณะ (ISWC 2022)

วิธีใช้

ข้อกำหนดการติดตั้ง: pip install -r requirements.txt
ติดตั้งไปป์ไลน์ spacy เพื่อใช้สำหรับการกล่าวถึงการแยก: python -m spacy download en_core_web_sm
สร้างไฟล์ main.py ด้วยนิยามการกำหนดค่าไปป์ไลน์และนิยามเอนทิตี ( Wikipedia Abstract มักจะเป็นจุดเริ่มต้นที่ดีสำหรับคำอธิบาย ):

 import spacy

from zshot import PipelineConfig , displacy
from zshot . linker import LinkerRegen
from zshot . mentions_extractor import MentionsExtractorSpacy
from zshot . utils . data_models import Entity

nlp = spacy . load ( "en_core_web_sm" )
nlp_config = PipelineConfig (
    mentions_extractor = MentionsExtractorSpacy (),
    linker = LinkerRegen (),
    entities = [
        Entity ( name = "Paris" ,
               description = "Paris is located in northern central France, in a north-bending arc of the river Seine" ),
        Entity ( name = "IBM" ,
               description = "International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York" ),
        Entity ( name = "New York" , description = "New York is a city in U.S. state" ),
        Entity ( name = "Florida" , description = "southeasternmost U.S. state" ),
        Entity ( name = "American" ,
               description = "American, something of, from, or related to the United States of America, commonly known as the United States or America" ),
        Entity ( name = "Chemical formula" ,
               description = "In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule" ),
        Entity ( name = "Acetamide" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
        Entity ( name = "Armonk" ,
               description = "Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States." ),
        Entity ( name = "Acetic Acid" ,
               description = "Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH" ),
        Entity ( name = "Industrial solvent" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
    ]
)
nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

text = "International Business Machines Corporation (IBM) is an American multinational technology corporation" 
       " headquartered in Armonk, New York, with operations in over 171 countries."

doc = nlp ( text )
displacy . serve ( doc , style = "ent" )

รันมัน

วิ่ง

$ python main.py

Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

สคริปต์จะใส่คำอธิบายประกอบข้อความโดยใช้ ZSHOT และใช้ความไม่พอใจในการแสดงภาพคำอธิบายประกอบ

ตรวจสอบ

เปิดเบราว์เซอร์ของคุณที่ http://127.0.0.1:5000

คุณจะเห็นประโยคคำอธิบายประกอบ:

วิธีสร้างองค์ประกอบที่กำหนดเอง

หากคุณต้องการใช้การกล่าวถึงของคุณเอง _extractor หรือ linker และใช้กับ zshot คุณสามารถทำได้ เพื่อให้ผู้ใช้สามารถใช้งานส่วนประกอบใหม่ได้ง่ายขึ้นบางคลาสฐานมีให้ที่คุณต้องขยายด้วยรหัสของคุณ

มันง่ายพอ ๆ กับการสร้างคลาสใหม่ที่ขยายคลาสฐาน ( MentionsExtractor หรือ Linker ) คุณจะต้องใช้วิธีการทำนายซึ่งจะได้รับเอกสาร Spacy และจะส่งคืนรายการ zshot.utils.data_models.Span สำหรับแต่ละเอกสาร

นี่คือการกล่าวถึงอย่างง่าย ๆ _extractor ที่จะแยกเป็นคำพูดทั้งหมดที่มีตัวอักษร s:

 from typing import Iterable
import spacy
from spacy . tokens import Doc
from zshot import PipelineConfig
from zshot . utils . data_models import Span
from zshot . mentions_extractor import MentionsExtractor

class SimpleMentionExtractor ( MentionsExtractor ):
    def predict ( self , docs : Iterable [ Doc ], batch_size = None ):
        spans = [[ Span ( tok . idx , tok . idx + len ( tok )) for tok in doc if "s" in tok . text ] for doc in docs ]
        return spans

new_nlp = spacy . load ( "en_core_web_sm" )

config = PipelineConfig (
    mentions_extractor = SimpleMentionExtractor ()
)
new_nlp . add_pipe ( "zshot" , config = config , last = True )
text_acetamide = "CH2O2 is a chemical compound similar to Acetamide used in International Business " 
        "Machines Corporation (IBM)."

doc = new_nlp ( text_acetamide )
print ( doc . _ . mentions )

> >> [ is , similar , used , Business , Machines , materials ]

วิธีประเมิน zshot

การประเมินผลเป็นกระบวนการที่สำคัญในการปรับปรุงประสิทธิภาพของแบบจำลองนั่นคือเหตุผลที่ ZSHOT อนุญาตให้ประเมินส่วนประกอบด้วยชุดข้อมูลที่กำหนดไว้ล่วงหน้าสองชุด: Ontonotes และ Medmentions ในเวอร์ชัน Zero-shot ซึ่งหน่วยงานของการทดสอบและการตรวจสอบความถูกต้องไม่ปรากฏในชุดรถไฟ

evaluation แพ็คเกจประกอบด้วยฟังก์ชันทั้งหมดเพื่อประเมินส่วนประกอบ ZSHOT ฟังก์ชั่นหลักคือ zshot.evaluation.zshot_evaluate.evaluate ซึ่งจะใช้เป็นอินพุตโมเดล nlp Spacy และชุดข้อมูลเพื่อประเมิน มันจะส่งคืน str ที่มีตารางพร้อมผลลัพธ์ของการประเมินผล ตัวอย่างเช่นการประเมินผลของ tars linker ใน zshot สำหรับชุด ตรวจสอบ ontonotes จะเป็น:

 import spacy

from zshot import PipelineConfig
from zshot . linker import LinkerTARS
from zshot . evaluation . dataset import load_ontonotes_zs
from zshot . evaluation . zshot_evaluate import evaluate , prettify_evaluate_report
from zshot . evaluation . metrics . seqeval . seqeval import Seqeval

ontonotes_zs = load_ontonotes_zs ( 'validation' )


nlp = spacy . blank ( "en" )
nlp_config = PipelineConfig (
    linker = LinkerTARS (),
    entities = ontonotes_zs . entities
)

nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

evaluation = evaluate ( nlp , ontonotes_zs , metric = Seqeval ())
prettify_evaluate_report ( evaluation )

การอ้างอิง

 @inproceedings{picco-etal-2023-zshot,
    title = "Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction",
    author = "Picco, Gabriele  and
      Martinez Galindo, Marcos  and
      Purpura, Alberto  and
      Fuchs, Leopold  and
      Lopez, Vanessa  and
      Hoang, Thanh Lam",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-demo.34",
    doi = "10.18653/v1/2023.acl-demo.34",
    pages = "357--368",
    abstract = "The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.",
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน v0.0.9
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2025-04-18
ขนาด 432.72KB
มาจาก Github

แอปที่เกี่ยวข้อง

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด