AdaKGC Download - AdaKGC Source code download

AdaKGC

AI Source Code

1.0.0

Download

?AdaKGC

news!

Paper code Schema-adaptable Knowledge Graph Construction .
Our work has been accepted by the EMNLP2023 Findings Conference.

? Environment dependency

To run the code, you need to install the following requirements:

conda create -n adakgc python=3.8
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

? Model

Our model tokenizer part uses UIE, and the others use t5, so it is a mixed file. The download link is provided here, please make sure to use this model. hf_models/mix

? Dataset

For more information about dataset construction, see Data Construction.

You can find the dataset via the following Google Drive link.

Dataset ACE05, Few-NERD, NYT

⚾ Run

 mkdir hf_models
cd hf_models
git lfs install
git clone https : // huggingface . co / google / t5 - v1_1 - base
cd ..

mkdir output           # */AdaKGC/output

Entity identification task

 # Current path:  */AdaKGC
mode=H
data_name=Few-NERD
task=entity
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/Few-NERD.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema  --use_prompt=True --init_prompt=True

model : The name or path of the pretrained model.

data : The path to the dataset.

output : The path of the saved fine-tuning checkpoint, and the final automatically generated output path `AdaKGC/output/ace05_event_H_e30_lr1e-4_b14_n0.

config : The default configuration file, in the config/prompt_conf directory, the configuration of each task is different.

mode : Dataset mode ( H , V , M , or R ).

device : CUDA_VISIBLE_DEVICES.

batch : batch size.

(See bash scripts and Python files for detailed command line parameters)

Relationship extraction tasks

mode=H
data_name=NYT
task=relation
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/NYT.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema  --use_prompt=True --init_prompt=True

Event extraction tasks

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/fine_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/ace05_event.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema  --use_prompt=True --init_prompt=True

? Reasoning

Inference only on a single dataset (e.g. data/ace05_event_H/iter_1 )

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} /iter_2 --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

datasetname : The path to the dataset to be predicted ( ace05_event , NYT or Few-NERD ).

model : The path of the model obtained after the previous training (output in the training stage).

t5_path : base model T5 (model in the training stage).

task : task type (entity, relation, event).

cuda : CUDA_VISIBLE_DEVICES.

mode : Dataset mode ( H , V , M , or R ).

use_ssi , use_prompt , prompt_len , prompt_dim need to be consistent with the training time. You can view and set it in the corresponding configuration file config/prompt_conf/ace05_event.ini.

Automatic inference on all iterative datasets (i.e. data/iter_1/ace05_event_H ~ data/iter _7/ace05_event_H )

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
python3 inference_mul.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

use_ssi , use_prompt , prompt_len , prompt_dim need to be consistent with training.

The complete process, including fine tuning and reasoning (in "scripts/run.bash"):

mode=H
data_name=ace05_event
task=event
device=0
ratio=0.8
bash scripts/run_prompt.bash --model=hf_models/mix --data=data/ ${data_name} _ ${mode} /iter_1 --output=output/ ${data_name} _ ${mode} _ ${ratio} --config=config/prompt_conf/ace05_event.ini --device= ${device} --negative_ratio= ${ratio} --record2=data/ ${data_name} _ ${mode} /iter_7/record.schema --use_prompt=True --init_prompt=True
python3 inference_mul.py --dataname=data/ ${data_name} / ${data_name} _ ${mode} --t5_path=hf_models/mix --model=output/ ${data_name} _ ${mode} _ ${ratio} --task= ${task} --cuda= ${device} --mode= ${mode} --use_prompt --use_ssi --prompt_len=80 --prompt_dim=512

index	definition	F1
ent-(P/R/F1)	Entity Micro-F1 score (Entity Type, Entity Span)	spot-F1
rel-strict-(P/R/F1)	Micro-F1 scores for strict relationship modes (Relation Type, Arg1 Span, Arg1 Type, Arg2 Span, Arg2 Type)	asoc-F1 is used for relationships, spot-F1 is used for entities
evt-trigger-(P/R/F1)	Micro-F1 score of event trigger word (Event Type, Trigger Span)	spot-F1
evt-role-(P/R/F1)	Micro-F1 score for event roles (Event Type, Arg Role, Arg Span)	asoc-F1

overall-F1 refers to the sum of spot-F1 and asoc-F1, which may exceed 100.

?‍? Acknowledgment

Part of our code is borrowed from UIE and UnifiedSKG, many thanks.

Papers for the Project & How to Cite

If you use or extend our work, please cite the paper as follows:

 @article { DBLP:journals/corr/abs-2305-08703 ,
  author       = { Hongbin Ye and
                  Honghao Gui and
                  Xin Xu and
                  Huajun Chen and
                  Ningyu Zhang } ,
  title        = { Schema-adaptable Knowledge Graph Construction } ,
  journal      = { CoRR } ,
  volume       = { abs/2305.08703 } ,
  year         = { 2023 } ,
  url          = { https://doi.org/10.48550/arXiv.2305.08703 } ,
  doi          = { 10.48550/arXiv.2305.08703 } ,
  eprinttype    = { arXiv } ,
  eprint       = { 2305.08703 } ,
  timestamp    = { Wed, 17 May 2023 15:47:36 +0200 } ,
  biburl       = { https://dblp.org/rec/journals/corr/abs-2305-08703.bib } ,
  bibsource    = { dblp computer science bibliography, https://dblp.org }
}