bert as language model
1.0.0
? Demo | casos-en | casos-zh |
Para una oración, tenemos
En el modelo de idioma tradicional, como RNN ,,
En el modelo de lenguaje bidireccional, tiene un contexto más amplio ,.
En esta implementación, simplemente adoptamos la siguiente aproximación,
.
Prueba la demostración web en
Más casos: 中文
export BERT_BASE_DIR=model/uncased_L-12_H-768_A-12
export INPUT_FILE=data/lm/test.en.tsv
python run_lm_predict.py
--input_file= $INPUT_FILE
--vocab_file= $BERT_BASE_DIR /vocab.txt
--bert_config_file= $BERT_BASE_DIR /bert_config.json
--init_checkpoint= $BERT_BASE_DIR /bert_model.ckpt
--max_seq_length=128
--output_dir=/tmp/lm_output/Para el siguiente caso de prueba
$ cat data/lm/test.en.tsv
there is a book on the desk
there is a plane on the desk
there is a book in the desk
$ cat /tmp/lm/output/test_result.jsonproducción:
# prob: probability
# ppl: perplexity
[
{
" tokens " : [
{
" token " : " there " ,
" prob " : 0.9988962411880493
},
{
" token " : " is " ,
" prob " : 0.013578361831605434
},
{
" token " : " a " ,
" prob " : 0.9420605897903442
},
{
" token " : " book " ,
" prob " : 0.07452250272035599
},
{
" token " : " on " ,
" prob " : 0.9607976675033569
},
{
" token " : " the " ,
" prob " : 0.4983428418636322
},
{
" token " : " desk " ,
" prob " : 4.040586190967588e-06
}
],
" ppl " : 17.69329728285426
},
{
" tokens " : [
{
" token " : " there " ,
" prob " : 0.996775209903717
},
{
" token " : " is " ,
" prob " : 0.03194097802042961
},
{
" token " : " a " ,
" prob " : 0.8877727389335632
},
{
" token " : " plane " ,
" prob " : 3.4907534427475184e-05 # low probability
},
{
" token " : " on " ,
" prob " : 0.1902322769165039
},
{
" token " : " the " ,
" prob " : 0.5981084704399109
},
{
" token " : " desk " ,
" prob " : 3.3164762953674654e-06
}
],
" ppl " : 59.646456254851806
},
{
" tokens " : [
{
" token " : " there " ,
" prob " : 0.9969795942306519
},
{
" token " : " is " ,
" prob " : 0.03379646688699722
},
{
" token " : " a " ,
" prob " : 0.9095568060874939
},
{
" token " : " book " ,
" prob " : 0.013939591124653816
},
{
" token " : " in " ,
" prob " : 0.000823647016659379 # low probability
},
{
" token " : " the " ,
" prob " : 0.5844194293022156
},
{
" token " : " desk " ,
" prob " : 3.3361218356731115e-06
}
],
" ppl " : 54.65941516205144
}
]