Paper code Schema-adaptable Knowledge Graph Construction .
Our work has been accepted by the EMNLP2023 Findings Conference.
To run the code, you need to install the following requirements:
conda create -n adakgc python=3.8
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Our model tokenizer part uses UIE, and the others use t5, so it is a mixed file. The download link is provided here, please make sure to use this model. hf_models/mix
For more information about dataset construction, see Data Construction.
You can find the dataset via the following Google Drive link.
Dataset ACE05, Few-NERD, NYT
mkdir hf_models
cd hf_models
git lfs install
git clone https : // huggingface . co / google / t5 - v1_1 - base
cd ..
mkdir output # */AdaKGC/output # Current path: */AdaKGC
mode=H
data_name=Few-NERD
task=entity
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/Few-NERD.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema --use_prompt=True --init_prompt=True
model : The name or path of the pretrained model.
data : The path to the dataset.
output : The path of the saved fine-tuning checkpoint, and the final automatically generated output path `AdaKGC/output/ace05_event_H_e30_lr1e-4_b14_n0.
config : The default configuration file, in the config/prompt_conf directory, the configuration of each task is different.
mode : Dataset mode ( H , V , M , or R ).
device : CUDA_VISIBLE_DEVICES.
batch : batch size.
(See bash scripts and Python files for detailed command line parameters)
mode=H
data_name=NYT
task=relation
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/NYT.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema --use_prompt=True --init_prompt=Truemode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/ace05_event.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema --use_prompt=True --init_prompt=Truedata/ace05_event_H/iter_1 ) mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} /iter_2 --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512 datasetname : The path to the dataset to be predicted ( ace05_event , NYT or Few-NERD ).
model : The path of the model obtained after the previous training (output in the training stage).
t5_path : base model T5 (model in the training stage).
task : task type (entity, relation, event).
cuda : CUDA_VISIBLE_DEVICES.
mode : Dataset mode ( H , V , M , or R ).
use_ssi , use_prompt , prompt_len , prompt_dim need to be consistent with the training time. You can view and set it in the corresponding configuration file config/prompt_conf/ace05_event.ini.
data/iter_1/ace05_event_H ~ data/iter _7/ace05_event_H ) mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference_mul.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512 use_ssi , use_prompt , prompt_len , prompt_dim need to be consistent with training.
The complete process, including fine tuning and reasoning (in "scripts/run.bash"):
mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/run_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/ace05_event.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema --use_prompt=True --init_prompt=True
python3 inference_mul.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512| index | definition | F1 |
|---|---|---|
| ent-(P/R/F1) | Entity Micro-F1 score (Entity Type, Entity Span) | spot-F1 |
| rel-strict-(P/R/F1) | Micro-F1 scores for strict relationship modes (Relation Type, Arg1 Span, Arg1 Type, Arg2 Span, Arg2 Type) | asoc-F1 is used for relationships, spot-F1 is used for entities |
| evt-trigger-(P/R/F1) | Micro-F1 score of event trigger word (Event Type, Trigger Span) | spot-F1 |
| evt-role-(P/R/F1) | Micro-F1 score for event roles (Event Type, Arg Role, Arg Span) | asoc-F1 |
overall-F1 refers to the sum of spot-F1 and asoc-F1, which may exceed 100.
Part of our code is borrowed from UIE and UnifiedSKG, many thanks.
If you use or extend our work, please cite the paper as follows:
@article { DBLP:journals/corr/abs-2305-08703 ,
author = { Hongbin Ye and
Honghao Gui and
Xin Xu and
Huajun Chen and
Ningyu Zhang } ,
title = { Schema-adaptable Knowledge Graph Construction } ,
journal = { CoRR } ,
volume = { abs/2305.08703 } ,
year = { 2023 } ,
url = { https://doi.org/10.48550/arXiv.2305.08703 } ,
doi = { 10.48550/arXiv.2305.08703 } ,
eprinttype = { arXiv } ,
eprint = { 2305.08703 } ,
timestamp = { Wed, 17 May 2023 15:47:36 +0200 } ,
biburl = { https://dblp.org/rec/journals/corr/abs-2305-08703.bib } ,
bibsource = { dblp computer science bibliography, https://dblp.org }
}