zshot 다운로드 zshot 소스 코드 다운로드

zshot

기타 소스코드

v0.0.9

다운로드

Zshot

Entity & Religinsion 인식이라는 이름이 0과 소수의 샷

문서 : https://ibm.github.io/zshot

소스 코드 : https://github.com/ibm/zshot

종이 : https://aclanthology.org/2023.acl-demo.34/

Zshot은 Entity 인식이라는 이름이 0 및 소수의 샷을 수행하기위한 고도로 사용자 정의 가능한 프레임 워크입니다.

수행하는 데 사용할 수 있습니다.

언급 추출 : 주어진 도메인과 관련된 전 세계 관련 언급 또는 언급 식별
Wikification : 텍스트 언급을 Wikipedia의 엔티티에 연결하는 작업
제로 및 소수의 샷이라는 이름의 엔티티 인식 : 언어 설명 사용 보이지 않는 도메인으로 일반화하기 위해 NER을 수행합니다.
관계 인식이라는 이름의 제로 및 소수의 샷
시각화 : 제로 샷 NER 및 RE 추출

요구 사항

Python 3.6+
spacy -Zshot는 파이프 및 시각화를위한 스파이에 의존합니다.
torch -Pytorch는 Pytorch 모델을 실행해야합니다.
transformers - 미리 훈련 된 언어 모델에 필요한.
evaluate - 평가에 필요합니다.
datasets - 데이터 세트를 평가해야합니다 (예 : Ontonotes).

선택적 종속성

flair Flair 언급 추출기 및 Tars Linker 및 Tars의 추출기를 사용하려면 필요합니다.
blink Wikipedia 페이지에 링크를 위해 Blink를 사용하려면 필요합니다.
gliner Gliner Linker 또는 Gliner 언급 추출기를 사용하려면 필수입니다.

설치

$ pip install zshot

---> 100%

예

예	공책
설치 및 시각화
지식 추출기
Wikification
맞춤 구성 요소
평가

zshot 접근

Zshot에는 언급 추출기 와 링커의 두 가지 구성 요소가 포함되어 있습니다.

추출기를 언급합니다

언급 추출기는 가능한 엔티티 (일명 언급)를 감지 한 다음 링커 에 의해 데이터 소스 (예 : Wikidata)에 연결됩니다.

현재 추출기가 지원되는 7 가지 언급 , SMXM, TARS, GLINER, 스파크를 기반으로 2 개, 감각을 기반으로하는 2 개가 있습니다. Spacy and Flair 에 대한 두 가지 버전은 비슷하며, 하나는 명명 된 엔티티 인식 및 분류 (NERC)를 기반으로하며 다른 하나는 언어학 (예 : 음성 태그 (POS) 및 종속성 구문 분석 (DP)의 일부 사용을 기반으로합니다.

NERC 접근법은 NERC 모델을 사용하여 연결되어야하는 모든 엔티티를 감지합니다. 이 접근법은 사용중인 모델과 모델이 훈련 된 엔터티에 달려 있으므로 사용 사례 및 대상 엔티티에 따라 엔티티가 NERC 모델에 의해 인식되지 않으므로 연결되지 않을 수 있으므로 최상의 접근 방식이 아닐 수 있습니다.

언어 적 접근법은 언급이 일반적으로 Syntagma 또는 명사 일 것이라는 생각에 의존합니다. 따라서이 접근법은 Syntagma에 포함되어 있고 객체, 주제 등과 같은 작용하는 명사를 감지합니다.이 접근법은 모델에 의존하지 않지만 (성능은 그렇습니다) 텍스트의 명사는 항상 명사이어야하며 모델이 훈련 된 데이터 세트에 의존하지 않아야합니다.

링커

링커는 감지 된 엔티티를 기존 레이블 세트에 연결합니다. 그러나 일부 링커 는 엔드 투 엔드 입니다. 즉, 실체를 동시에 감지하고 연결하므로 언급 추출기가 필요하지 않습니다.

다시 말하지만, 현재 사용 가능한 5 개의 링커가 있으며 그 중 3 개는 엔드 투 엔드 이며 2 개는 그렇지 않습니다.

링커 이름	엔드 투 엔드	소스 코드	종이
깜박거리다	엑스	소스 코드	종이
장르	엑스	소스 코드	종이
smxm	✓	소스 코드	종이
타르	✓	소스 코드	종이
글라이너	✓	소스 코드	종이

관계 추출기

관계 추출기는 이전에 링커 에 의해 추출 된 여러 개체 간의 관계를 추출합니다.

현재 사용 가능한 관계 추출기는 하나입니다.

ZS-Bert
- 종이
- 소스 코드

지식 추출기

지식 추출기는 동시에 명명 된 엔티티의 추출 및 분류 및 그 사이의 관계 추출을 수행합니다. 이 구성 요소가있는 파이프 라인에는 추출기 , 링커 또는 관계 추출기가 필요하지 않습니다.

현재 사용 가능한 지식 추출기 중 하나입니다.

knowgl
- Rossiello et al. (AAAI 2023)
- Mihindukulasooriya et al. (ISWC 2022)

사용하는 방법

설치 요구 사항 : pip install -r requirements.txt
추출 : python -m spacy download en_core_web_sm 사용하기 위해 스파이 파이프 라인을 설치하십시오.
파이프 라인 구성 및 엔터티 정의로 파일 main.py 만듭니다 ( Wikipedia Abstract는 일반적으로 설명의 좋은 출발점입니다 ).

 import spacy

from zshot import PipelineConfig , displacy
from zshot . linker import LinkerRegen
from zshot . mentions_extractor import MentionsExtractorSpacy
from zshot . utils . data_models import Entity

nlp = spacy . load ( "en_core_web_sm" )
nlp_config = PipelineConfig (
    mentions_extractor = MentionsExtractorSpacy (),
    linker = LinkerRegen (),
    entities = [
        Entity ( name = "Paris" ,
               description = "Paris is located in northern central France, in a north-bending arc of the river Seine" ),
        Entity ( name = "IBM" ,
               description = "International Business Machines Corporation (IBM) is an American multinational technology corporation headquartered in Armonk, New York" ),
        Entity ( name = "New York" , description = "New York is a city in U.S. state" ),
        Entity ( name = "Florida" , description = "southeasternmost U.S. state" ),
        Entity ( name = "American" ,
               description = "American, something of, from, or related to the United States of America, commonly known as the United States or America" ),
        Entity ( name = "Chemical formula" ,
               description = "In chemistry, a chemical formula is a way of presenting information about the chemical proportions of atoms that constitute a particular chemical compound or molecule" ),
        Entity ( name = "Acetamide" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
        Entity ( name = "Armonk" ,
               description = "Armonk is a hamlet and census-designated place (CDP) in the town of North Castle, located in Westchester County, New York, United States." ),
        Entity ( name = "Acetic Acid" ,
               description = "Acetic acid, systematically named ethanoic acid, is an acidic, colourless liquid and organic compound with the chemical formula CH3COOH" ),
        Entity ( name = "Industrial solvent" ,
               description = "Acetamide (systematic name: ethanamide) is an organic compound with the formula CH3CONH2. It is the simplest amide derived from acetic acid. It finds some use as a plasticizer and as an industrial solvent." ),
    ]
)
nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

text = "International Business Machines Corporation (IBM) is an American multinational technology corporation" 
       " headquartered in Armonk, New York, with operations in over 171 countries."

doc = nlp ( text )
displacy . serve ( doc , style = "ent" )

실행하십시오

함께 실행하십시오

$ python main.py

Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

스크립트는 zshot을 사용하여 텍스트에 주석을 달고 주석을 시각화하기 위해 변위를 사용합니다.

확인하십시오

http://127.0.0.1:5000에서 브라우저를 엽니 다.

주석이 달린 문장이 표시됩니다.

사용자 정의 구성 요소를 만드는 방법

자신의 언급을 구현하고 Zshot과 함께 사용하려면 할 수 있습니다. 사용자가 새 구성 요소를보다 쉽게 구현할 수 있도록 일부 기본 클래스가 제공되어 코드를 확장해야합니다.

기본 클래스를 확장하는 새 클래스를 만드는 것만 큼 간단합니다 ( MentionsExtractor 또는 Linker ). 스파크 문서를 수신하고 각 문서에 대한 zshot.utils.data_models.Span 목록을 반환하는 예측 방법을 구현해야합니다.

이것은 문자 s가 포함 된 모든 단어를 언급하는대로 추출 할 간단한 언급입니다.

 from typing import Iterable
import spacy
from spacy . tokens import Doc
from zshot import PipelineConfig
from zshot . utils . data_models import Span
from zshot . mentions_extractor import MentionsExtractor

class SimpleMentionExtractor ( MentionsExtractor ):
    def predict ( self , docs : Iterable [ Doc ], batch_size = None ):
        spans = [[ Span ( tok . idx , tok . idx + len ( tok )) for tok in doc if "s" in tok . text ] for doc in docs ]
        return spans

new_nlp = spacy . load ( "en_core_web_sm" )

config = PipelineConfig (
    mentions_extractor = SimpleMentionExtractor ()
)
new_nlp . add_pipe ( "zshot" , config = config , last = True )
text_acetamide = "CH2O2 is a chemical compound similar to Acetamide used in International Business " 
        "Machines Corporation (IBM)."

doc = new_nlp ( text_acetamide )
print ( doc . _ . mentions )

> >> [ is , similar , used , Business , Machines , materials ]

zshot을 평가하는 방법

평가는 모델의 성능을 계속 향상시키는 데 중요한 프로세스이므로 ZSHOT는 테스트 및 유효성 검사 분할의 엔티티가 열차 세트에 나타나지 않는 제로 샷 버전으로 ontonotes 및 medmentions의 두 가지 미리 정의 된 데이터 세트로 구성 요소를 평가할 수 있습니다.

패키지 evaluation Zshot 구성 요소를 평가하는 모든 기능이 포함되어 있습니다. 주요 함수는 zshot.evaluation.zshot_evaluate.evaluate 입니다. Spacy nlp 모델과 데이터 세트를 평가할 수 있습니다. 평가 결과와 함께 테이블이 포함 된 str 반환합니다. 예를 들어, Ontonotes 검증 세트에 대한 Zshot의 TARS 링커 평가는 다음과 같습니다.

 import spacy

from zshot import PipelineConfig
from zshot . linker import LinkerTARS
from zshot . evaluation . dataset import load_ontonotes_zs
from zshot . evaluation . zshot_evaluate import evaluate , prettify_evaluate_report
from zshot . evaluation . metrics . seqeval . seqeval import Seqeval

ontonotes_zs = load_ontonotes_zs ( 'validation' )


nlp = spacy . blank ( "en" )
nlp_config = PipelineConfig (
    linker = LinkerTARS (),
    entities = ontonotes_zs . entities
)

nlp . add_pipe ( "zshot" , config = nlp_config , last = True )

evaluation = evaluate ( nlp , ontonotes_zs , metric = Seqeval ())
prettify_evaluate_report ( evaluation )

소환

 @inproceedings{picco-etal-2023-zshot,
    title = "Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction",
    author = "Picco, Gabriele  and
      Martinez Galindo, Marcos  and
      Purpura, Alberto  and
      Fuchs, Leopold  and
      Lopez, Vanessa  and
      Hoang, Thanh Lam",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-demo.34",
    doi = "10.18653/v1/2023.acl-demo.34",
    pages = "357--368",
    abstract = "The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.",
}

확장하다

추가 정보

버전 v0.0.9
유형 기타 소스코드
업데이트 시간 2025-04-18
크기 432.72KB
출처 Github

zshot

Zshot

요구 사항

선택적 종속성

설치

예

zshot 접근

추출기를 언급합니다

링커

관계 추출기

지식 추출기

사용하는 방법

실행하십시오

확인하십시오

사용자 정의 구성 요소를 만드는 방법

zshot을 평가하는 방법

소환

Google Dorks

shepherd

mongo express

hidusbf

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

mongo express

Google Dorks

shepherd

mongo express