BERT Relation Extraction下載 - BERT Relation Extraction源代碼下載

BERT Relation Extraction

其他源碼

1.0.0

下載

Bert（S）進行關係提取

概述

在ACL 2019中發布的論文“與空白：關係學習的分佈相似性相匹配”的模型的Pytorch實施。
注意：這不是該論文的正式回購。
基於本文的方法論，此處實施的其他關係提取的其他模型：

阿爾伯特（https://arxiv.org/abs/1909.11942）
Biobert（https://arxiv.org/abs/1901.08746）

有關實施的更多概念詳細信息，請參閱https://towardsdatascience.com/bert-s-for-relation-traction-nlp-2c7c3ab487c4

如果您喜歡我的工作，請考慮通過單擊頂部的讚助商按鈕來贊助。

要求

要求：Python（3.8+）

python3 -m pip install -r requirements.txt
python3 -m spacy download en_core_web_lg

預先訓練的BERT模型（Albert，Bert）由HuggingFace.co（https://huggingface.co）提供
https://github.com/dmis-lab/biobert提供的預培訓的生物模型

要使用biobert（biobert_v1.1_pubmed），請從此處下載並解開該模型。

通過匹配空白（BERT _EM + MTB）培訓

在下面的參數中運行main_pretraining.py。預培訓數據可以是任何.txt連續文本文件。
我們使用Spacy NLP從文本中獲取成對實體（窗口大小在40個令牌長度內），以形成與預訓練的關係語句。實體識別基於對象/受試者的依賴樹解析。

我使用的從CNN數據集（CNN.TXT）獲取的預培訓數據可以在此處下載。
下載並另存為./data/cnn.txt
但是，請注意，本文使用Wiki轉儲數據進行MTB預訓練，該數據比CNN數據集大得多。

注意：根據可用的GPU，預培訓可能需要很長時間。遵循以下部分，可以直接對關係撤銷任務進行微調，並仍然獲得合理的結果。

main_pretraining.py [-h] 
	[--pretrain_data TRAIN_PATH] 
	[--batch_size BATCH_SIZE]
	[--freeze FREEZE]  
	[--gradient_acc_steps GRADIENT_ACC_STEPS]
	[--max_norm MAX_NORM]
	[--fp16 FP_16]  
	[--num_epochs NUM_EPOCHS]
	[--lr LR]
	[--model_no MODEL_NO (0: BERT ; 1: ALBERT ; 2: BioBERT)]  
	[--model_size MODEL_SIZE (BERT: ' bert-base-uncased ' , ' bert-large-uncased ' ;   
				ALBERT: ' albert-base-v2 ' , ' albert-large-v2 ' ;   
				BioBERT: ' bert-base-uncased ' (biobert_v1.1_pubmed))]

SEMEVAL2010任務8（BERT _EM /BERT _EM + MTB）的微調

在下面的參數上運行main_task.py。需要SEMEVAL2010任務8數據集，可在此處提供。下載＆UNZIP到./data/文件夾。

main_task.py [-h] 
	[--train_data TRAIN_DATA]
	[--test_data TEST_DATA]
	[--use_pretrained_blanks USE_PRETRAINED_BLANKS]
	[--num_classes NUM_CLASSES] 
	[--batch_size BATCH_SIZE]
	[--gradient_acc_steps GRADIENT_ACC_STEPS]
	[--max_norm MAX_NORM]
	[--fp16 FP_16]  
	[--num_epochs NUM_EPOCHS]
	[--lr LR]
	[--model_no MODEL_NO (0: BERT ; 1: ALBERT ; 2: BioBERT)]  
	[--model_size MODEL_SIZE (BERT: ' bert-base-uncased ' , ' bert-large-uncased ' ;   
				ALBERT: ' albert-base-v2 ' , ' albert-large-v2 ' ;   
				BioBERT: ' bert-base-uncased ' (biobert_v1.1_pubmed))]    
	[--train TRAIN]
	[--infer INFER]

推理（-infer = 1）

要推斷句子，您可以在句子中註釋entity1＆entity2 in and and in suntions and tity and int sunts and tity and in sunts and sunts的實體標籤[E1]，[E2]。例子：

Type input sentence ( ' quit ' or ' exit ' to terminate):
The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.

Sentence:  The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.
Predicted:  Cause-Effect(e1,e2)

 from src . tasks . infer import infer_from_trained

inferer = infer_from_trained ( args , detect_entities = False )
test = "The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor."
inferer . infer_sentence ( test , detect_entities = False )

Sentence:  The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.
Predicted:  Cause-Effect(e1,e2)

該腳本還可以自動檢測輸入句子中的潛在實體，在這種情況下，都會推斷出所有可能的關係組合：

 inferer = infer_from_trained ( args , detect_entities = True )
test2 = "After eating the chicken, he developed a sore throat the next morning."
inferer . infer_sentence ( test2 , detect_entities = True )

Sentence:  [E2]After eating the chicken[/E2] , [E1]he[/E1] developed a sore throat the next morning .
Predicted:  Other 

Sentence:  After eating the chicken , [E1]he[/E1] developed [E2]a sore throat[/E2] the next morning .
Predicted:  Other 

Sentence:  [E1]After eating the chicken[/E1] , [E2]he[/E2] developed a sore throat the next morning .
Predicted:  Other 

Sentence:  [E1]After eating the chicken[/E1] , he developed [E2]a sore throat[/E2] the next morning .
Predicted:  Other 

Sentence:  After eating the chicken , [E2]he[/E2] developed [E1]a sore throat[/E1] the next morning .
Predicted:  Other 

Sentence:  [E2]After eating the chicken[/E2] , he developed [E1]a sore throat[/E1] the next morning .
Predicted:  Cause-Effect(e2,e1)