GPTNERMED下載 - GPTNERMED源代碼下載

GPTNERMED

Ai源碼

1.0.0

下載

gptnermed

關於

GPTNERMED是一種新穎的開放合成數據集和神經命名 - 實體識別（NER）模型，用於醫學自然語言處理（NLP）中的德語文本。

關鍵功能：

支持的標籤：勳章， dosis ，診斷
開放銀色標準的德國醫療數據集： 245107代幣，帶有Dosis註釋（ ＃7547 ），Medikation（ ＃9868 ）和診斷（ ＃5996 ）
基於GPT Neox的合成數據集
使用Gbert-Large ， Gottbert-Base或German-Medbert進行NER解析的轉移學習
開放，公開訪問模型

在線演示：可用一個演示頁面：演示或使用下面給出的擁抱面鏈接。

請參閱我們發表的論文，網址為https://doi.org/10.1016/j.jbi.2023.104478。

我們的預印紙可從https://arxiv.org/pdf/2208.14493.pdf獲得。

NER演示：

型號

可以從以下URL中檢索到驗證的模型：

基於Gbert：模型鏈接
總部位於戈特伯特：模型鏈接
總部位於德國 - 媒體：模型鏈接

這些模型也可以在HuggingFace平台上使用：

總部位於Gbert：HuggingFace鏈接
總部位於戈特伯特：擁抱面鏈接
總部位於德國梅德伯特：擁抱面鏈接

HuggingFace數據集：數據集也可作為擁抱表數據集可用。
您可以按照以下方式加載模型：

 # You need to install datasets first, using: pip install datasets
from datasets import load_dataset
dataset = load_dataset ( "jfrei/GPTNERMED" )

分數

注意：度量得分是通過角色分類評估的。

從分發數據集（在OoD-dataset_GoldStandard.jsonl中提供）：

模型	公制	藥物=獎章
吉伯特·萊爾格（Gbert-Large）	PR	0.707
	關於	0.979
	F1	0.821
戈特伯特·基斯	PR	0.800
	關於	0.899
	F1	0.847
德國媒體	PR	0.727
	關於	0.818
	F1	0.770

測試集：

模型	公制	獎章	診斷	Dosis	全部的
吉伯特·萊爾格（Gbert-Large）	PR	0.870	0.870	0.883	0.918
	關於	0.936	0.895	0.921	0.919
	F1	0.949	0.882	0.901	0.918
戈特伯特·基斯	PR	0.979	0.896	0.887	0.936
	關於	0.910	0.844	0.907	0.886
	F1	0.943	0.870	0.897	0.910
德國媒體	PR	0.980	0.910	0.829	0.932
	關於	0.905	0.730	0.890	0.842
	F1	0.941	0.810	0.858	0.883

設置和用法

這些模型是基於Spacy的。示例代碼用Python編寫。

model_link= " https://myweb.rz.uni-augsburg.de/~freijoha/GPTNERMED/GPTNERMED_gbert.zip "

# [Optional] Create env
python3 -m venv env
source ./env/bin/activate

# Install dependencies
python3 -m pip install -r requirements.txt

# Download & extract model
wget -O model.zip " $model_link "
unzip model.zip -d " model "

# Run script
python3 GPTNERMED.py

引用

在下面編寫的Bibtex引用我們的工作，或使用紙張中的引用工具。

 @article{FREI2023104478,
title = {Annotated dataset creation through large language models for non-english medical NLP},
journal = {Journal of Biomedical Informatics},
volume = {145},
pages = {104478},
year = {2023},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2023.104478},
url = {https://www.sciencedirect.com/science/article/pii/S1532046423001995},
author = {Johann Frei and Frank Kramer},
keywords = {Natural language processing, Information extraction, Named entity recognition, Data augmentation, Knowledge distillation, Medication detection},
abstract = {Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom-designed datasets to address NLP tasks in a supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as the lack of task-matching datasets as well as task-specific pre-trained models. In our work, we suggest to leverage pre-trained large language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case-specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset that we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at https://github.com/frankkramer-lab/GPTNERMED.}
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-06
大小 324.34KB
來自於 Github

相關應用

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部