ดาวน์โหลด GPTNERMED - ดาวน์โหลดซอร์สโค้ด GPTNERMED

GPTNERMED

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

gptnermed

เกี่ยวกับ

GPTNERMED เป็นชุดข้อมูลแบบเปิดใหม่แบบเปิดและแบบจำลองการจดจำความรู้ความเข้าใจ (NER) ของระบบประสาทสำหรับตำราภาษาเยอรมันในการประมวลผลภาษาธรรมชาติทางการแพทย์ (NLP)

คุณสมบัติที่สำคัญ:

ฉลากที่รองรับ: Medikation , dosis , วินิจฉัย
ชุดข้อมูลทางการแพทย์ของเยอรมันแบบเปิดมาตรฐานเงิน: 245107 โทเค็น พร้อมคำอธิบายประกอบสำหรับ Dosis ( #7547 ), Medikation ( #9868 ) และวินิจฉัย ( #5996 )
ชุดข้อมูลสังเคราะห์ขึ้นอยู่กับ GPT Neox
การถ่ายโอนการเรียนรู้ สำหรับการแยกวิเคราะห์โดยใช้ Gbert-Large , Gottbert-Base หรือ German-Medbert
เปิดการเข้าถึงแบบจำลองสาธารณะ

การสาธิตออนไลน์ : มีหน้าสาธิต: สาธิตหรือใช้ลิงก์ HuggingFace ที่ระบุไว้ด้านล่าง

ดู เอกสารที่ตีพิมพ์ ของเราที่ https://doi.org/10.1016/j.jbi.2023.104478

กระดาษเตรียมล่วงหน้าของเรามีอยู่ที่ https://arxiv.org/pdf/2208.14493.pdf

การสาธิต ner:

แบบจำลอง

แบบจำลองที่ผ่านการฝึกอบรมสามารถดึงได้จาก URL ต่อไปนี้:

Gbert-based: ลิงค์โมเดล
Gottbert-based: ลิงค์โมเดล
German-Medbert: ลิงค์โมเดล

รุ่นนี้ยังมีอยู่บนแพลตฟอร์ม HuggingFace :

Gbert-based: huggingface link
Gottbert-based: huggingface link
German Medbert-based: HuggingFace Link

ชุดข้อมูล HuggingFace: ชุดข้อมูลยังมีให้เป็นชุดข้อมูล HuggingFace
คุณสามารถโหลดรุ่นดังนี้:

 # You need to install datasets first, using: pip install datasets
from datasets import load_dataset
dataset = load_dataset ( "jfrei/GPTNERMED" )

คะแนน

หมายเหตุ: คะแนนการวัดได้รับการประเมินโดยการจำแนกอักขระที่ชาญฉลาด

ออกจากชุดข้อมูลการกระจาย (มีให้ใน OoD-dataset_GoldStandard.jsonl ):

แบบอย่าง	ตัวชี้วัด	ยา = medikation
เกิร์ตขนาดใหญ่	PR	0.707
	อีกครั้ง	0.979
	F1	0.821
gottbert-base	PR	0.800
	อีกครั้ง	0.899
	F1	0.847
เยอรมัน-เมดเบิร์ต	PR	0.727
	อีกครั้ง	0.818
	F1	0.770

ชุดทดสอบ :

แบบอย่าง	ตัวชี้วัด	การเล่นกล	วินิจฉัย	ความตาย	ทั้งหมด
เกิร์ตขนาดใหญ่	PR	0.870	0.870	0.883	0.918
	อีกครั้ง	0.936	0.895	0.921	0.919
	F1	0.949	0.882	0.901	0.918
gottbert-base	PR	0.979	0.896	0.887	0.936
	อีกครั้ง	0.910	0.844	0.907	0.886
	F1	0.943	0.870	0.897	0.910
เยอรมัน-เมดเบิร์ต	PR	0.980	0.910	0.829	0.932
	อีกครั้ง	0.905	0.730	0.890	0.842
	F1	0.941	0.810	0.858	0.883

การตั้งค่าและการใช้งาน

โมเดลขึ้นอยู่กับเครื่องราง โค้ดตัวอย่างถูกเขียนใน Python

model_link= " https://myweb.rz.uni-augsburg.de/~freijoha/GPTNERMED/GPTNERMED_gbert.zip "

# [Optional] Create env
python3 -m venv env
source ./env/bin/activate

# Install dependencies
python3 -m pip install -r requirements.txt

# Download & extract model
wget -O model.zip " $model_link "
unzip model.zip -d " model "

# Run script
python3 GPTNERMED.py

การอ้างอิง

อ้างถึงงานของเรากับ Bibtex ตามที่เขียนไว้ด้านล่างหรือใช้เครื่องมือการอ้างอิงจากกระดาษ

 @article{FREI2023104478,
title = {Annotated dataset creation through large language models for non-english medical NLP},
journal = {Journal of Biomedical Informatics},
volume = {145},
pages = {104478},
year = {2023},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2023.104478},
url = {https://www.sciencedirect.com/science/article/pii/S1532046423001995},
author = {Johann Frei and Frank Kramer},
keywords = {Natural language processing, Information extraction, Named entity recognition, Data augmentation, Knowledge distillation, Medication detection},
abstract = {Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom-designed datasets to address NLP tasks in a supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as the lack of task-matching datasets as well as task-specific pre-trained models. In our work, we suggest to leverage pre-trained large language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case-specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset that we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at https://github.com/frankkramer-lab/GPTNERMED.}
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-06
ขนาด 324.34KB
มาจาก Github

แอปที่เกี่ยวข้อง

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด