TextGenerationEvaluationMetrics
1.0.0
이것은이 백서에서 소개되는 다양성과 품질을 측정하기위한 메트릭의 구현입니다. 게다가, 다른 메트릭이 존재합니다.
BLEU 및 SELFBLEU의 경우이 초 성능 구현이 사용됩니다.
다음은 MS-Jaccard 거리를 계산하는 예입니다. 이 메트릭의 입력은 토큰 화 된 문장 목록입니다.
from multiset_distances import MultisetDistances
ref1 = [ 'It' , 'is' , 'a' , 'guide' , 'to' , 'action' , 'that' , 'ensures' , 'that' , 'the' , 'military' , 'will' , 'forever' , 'heed' , 'Party' , 'commands' ]
ref2 = [ 'It' , 'is' , 'the' , 'guiding' , 'principle' , 'which' , 'guarantees' , 'the' , 'military' , 'forces' , 'always' , 'being' , 'under' , 'the' , 'command' , 'of' , 'the' , 'Party' ]
ref3 = [ 'It' , 'is' , 'the' , 'practical' , 'guide' , 'for' , 'the' , 'army' , 'always' , 'to' , 'heed' , 'the' , 'directions' , 'of' , 'the' , 'party' ]
sen1 = [ 'It' , 'is' , 'a' , 'guide' , 'to' , 'action' , 'which' , 'ensures' , 'that' , 'the' , 'military' , 'always' , 'obeys' , 'the' , 'commands' , 'of' , 'the' , 'party' ]
sen2 = [ 'he' , 'read' , 'the' , 'book' , 'because' , 'he' , 'was' , 'interested' , 'in' , 'world' , 'history' ]
references = [ ref1 , ref2 , ref3 ]
sentences = [ sen1 , sen2 ]
msd = MultisetDistances ( references = references )
msj_distance = msd . get_jaccard_score ( sentences = sentences ) msj_distance 의 값은 {3: 0.17, 4: 0.13, 5: 0.09} 이며, 각각 3 그램, 4은 및 5 그램의 MS-Jaccard를 보여줍니다.
다음은 FBD 및 END 거리를 계산하는 예입니다. 이러한 메트릭의 입력은 문자열 목록이며 Bert Tokenizer는 코드에 사용됩니다.
from bert_distances import FBD , EMBD
references = [ "that is very good" , "it is great" ]
sentences1 = [ "this is nice" , "that is good" ]
sentences2 = [ "it is bad" , "this is very bad" ]
fbd = FBD ( references = references , model_name = "bert-base-uncased" , bert_model_dir = "/tmp/Bert/" )
fbd_distance_sentences1 = fbd . get_score ( sentences = sentences1 )
fbd_distance_sentences2 = fbd . get_score ( sentences = sentences2 )
# fbd_distance_sentences1 = 17.8, fbd_distance_sentences2 = 22.0
embd = EMBD ( references = references , model_name = "bert-base-uncased" , bert_model_dir = "/tmp/Bert/" )
embd_distance_sentences1 = embd . get_score ( sentences = sentences1 )
embd_distance_sentences2 = embd . get_score ( sentences = sentences2 )
# embd_distance_sentences1 = 10.9, embd_distance_sentences2 = 20.4연구에 도움이된다면 논문을 인용하십시오.
@misc{montahaei2019jointly,
title={Jointly Measuring Diversity and Quality in Text Generation Models},
author={Ehsan Montahaei and Danial Alihosseini and Mahdieh Soleymani Baghshah},
year={2019},
eprint={1904.03971},
archivePrefix={arXiv},
primaryClass={cs.LG}
}