GPTNERMED 다운로드 GPTNERMED 소스 코드 다운로드

GPTNERMED

AI 소스 코드

1.0.0

다운로드

gptnermed

에 대한

GPTNERMED는 의료 자연어 처리 (NLP)의 독일 텍스트에 대한 신경 개방형 합성 데이터 세트 및 NER (Neural) 명명-엔티티-인식 (NER) 모델입니다.

주요 기능 :

지원되는 레이블 : 메디 믹 화 , 도미 , 진단
오픈 실버 스탠다드 독일 의료 데이터 세트 : 245107 도미스 ( #7547 ), 메디 니케이션 ( #9868 ) 및 진단 ( #5996 )에 대한 주석이있는 토큰.
GPT Neox를 기반으로 합성 된 데이터 세트
Gbert-Large , Gottbert-Base 또는 German-Medbert를 사용한 NER 파싱에 대한 양도 학습
개방형, 모델에 대한 공개 액세스

온라인 데모 : 데모 페이지를 사용할 수 있습니다 : 데모 또는 아래에 주어진 huggingface 링크를 사용하십시오.

게시 된 논문은 https://doi.org/10.1016/j.jbi.2023.104478을 참조하십시오.

프리 프린트 용지는 https://arxiv.org/pdf/2208.14493.pdf에서 구입할 수 있습니다.

NER 데모 :

모델

사전 제한 모델은 다음 URL에서 검색 할 수 있습니다.

Gbert 기반 : 모델 링크
Gottbert 기반 : 모델 링크
독일-메드 버트 기반 : 모델 링크

이 모델은 Huggingface 플랫폼에서도 사용할 수 있습니다.

Gbert 기반 : Huggingface 링크
Gottbert 기반 : Huggingface 링크
독일 Medbert 기반 : Huggingface Link

Huggingface 데이터 세트 : 데이터 세트는 Huggingface 데이터 세트로도 제공됩니다.
다음과 같이 모델을로드 할 수 있습니다.

 # You need to install datasets first, using: pip install datasets
from datasets import load_dataset
dataset = load_dataset ( "jfrei/GPTNERMED" )

점수

참고 : 메트릭 점수는 문자 별 분류로 평가됩니다.

분포 데이터 세트 ( OoD-dataset_GoldStandard.jsonl 에 제공됨) :

모델	메트릭	마약 = 중학
Gbert-Large	PR	0.707
	답장	0.979
	F1	0.821
Gottbert-Base	PR	0.800
	답장	0.899
	F1	0.847
독일-메드 베르트	PR	0.727
	답장	0.818
	F1	0.770

테스트 세트 :

모델	메트릭	중간	진단	도미	총
Gbert-Large	PR	0.870	0.870	0.883	0.918
	답장	0.936	0.895	0.921	0.919
	F1	0.949	0.882	0.901	0.918
Gottbert-Base	PR	0.979	0.896	0.887	0.936
	답장	0.910	0.844	0.907	0.886
	F1	0.943	0.870	0.897	0.910
독일-메드 베르트	PR	0.980	0.910	0.829	0.932
	답장	0.905	0.730	0.890	0.842
	F1	0.941	0.810	0.858	0.883

설정 및 사용

모델은 Spacy를 기반으로합니다. 샘플 코드는 Python으로 작성되었습니다.

model_link= " https://myweb.rz.uni-augsburg.de/~freijoha/GPTNERMED/GPTNERMED_gbert.zip "

# [Optional] Create env
python3 -m venv env
source ./env/bin/activate

# Install dependencies
python3 -m pip install -r requirements.txt

# Download & extract model
wget -O model.zip " $model_link "
unzip model.zip -d " model "

# Run script
python3 GPTNERMED.py

소환

아래에 쓰여진대로 Bibtex로 작업을 인용하거나 논문의 인용 도구를 사용하십시오.

 @article{FREI2023104478,
title = {Annotated dataset creation through large language models for non-english medical NLP},
journal = {Journal of Biomedical Informatics},
volume = {145},
pages = {104478},
year = {2023},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2023.104478},
url = {https://www.sciencedirect.com/science/article/pii/S1532046423001995},
author = {Johann Frei and Frank Kramer},
keywords = {Natural language processing, Information extraction, Named entity recognition, Data augmentation, Knowledge distillation, Medication detection},
abstract = {Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom-designed datasets to address NLP tasks in a supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as the lack of task-matching datasets as well as task-specific pre-trained models. In our work, we suggest to leverage pre-trained large language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case-specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset that we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at https://github.com/frankkramer-lab/GPTNERMED.}
}

확장하다

추가 정보