RoFormer_pytorch Télécharger - RoFormer_pytorch CODE SOURCE Télécharger

Pytorch roformer & roformer-v2

Modèle de roformer et modèle ROFORMER-V2

renouveler

2022/05/18

Ajoutez les résultats d'entraînement de la version paddle roForterv2 sur les tâches de classification.

2022/05/11

Grâce au rappel Su Shen, un commentaire a été ajouté où RoForterv2 * représente le modèle ROFORDERV2 qui n'a pas été appris par le multitâche.

2022/05/01

Ajoutez les résultats du code et de la définition de développement des clue分类任务. Le code est dans le dossier examples/clue . Ce qui manque dépend de ce qui est installé, tel que pip install -U accelerate .

2022/04/30

Il y a un détail qui doit être noté. Lorsque su shen ajusté, que l'entrée soit le type de paire text ou text pair , token_type_id est défini sur 0.

Si vous souhaitez utiliser cohérent avec su shen, vous pouvez définir return_token_type_ids=False lorsque tokenizer , afin que le modèle soit traité en interne.

Sinon, pour le type text pair , token_type_id avec 0 et 0，1 sera retourné.

2022/04/02

(1) Modifier RoFormerForcauSallm, prendre en charge roformer-sim et fournir des exemples pertinents, voir examples/test_sim.py .

(2) Modifiez la méthode d'implémentation apply_rotary , qui semble plus simple.

 def apply_rotary ( x , sinusoidal_pos = None ):
    if sinusoidal_pos is None :
        return x
    sin , cos = sinusoidal_pos
    # x.shape [batch, seq_len, 2]
    x1 , x2 = x [..., 0 :: 2 ], x [..., 1 :: 2 ]
    # [cos_nθ, -sin_nθ] [x1]
    # [sin_nθ,  cos_nθ] [x2]
    # => [x1 * cos_nθ - x2 * sin_nθ, x1 * sin_nθ + x2 * cos_nθ]
    # 苏神的rotary，使用了下面的计算方法。
    # return torch.stack([x1 * cos - x2 * sin, x1 * sin + x2 * cos], dim=-1).flatten(-2, -1)
    # 考虑到矩阵乘法torch.einsum("bhmd,bhnd->bhmn", q, k)，因此可以直接在最后一个维度拼接（无需奇偶交错）
    return torch . cat ([ x1 * cos - x2 * sin , x1 * sin + x2 * cos ], dim = - 1 )

2022/03/21 Ajouter le poids de roformer-v2 . Remarque: Le code de ce référentiel doit être utilisé et le code du référentiel Transformers ne peut pas être utilisé !!!

Installer

 # v2版本
pip install roformer > =0.4.3
# v1版本(代码已经加入到huggingface仓库，请使用新版本的transformers)
pip install -U transformers

Examiner la comparaison

Résultats de la tâche de classification de la liste des indices, base + grande version.

	iflytek	tnews	afqmc	cmnli	ocnli	WSC	CSL	AVG
Bert	60.06	56.80	72.41	79.56	73.93	78,62	83.93	72.19
Roberta	60,64	58.06	74.05	81.24	76.00	87.50	84.50	74.57
Roformer	60,91	57,54	73,52	80,92	76.07	86.84	84.63	74.35
RoForterv2 ^*	60,87	56,54	72.75	80.34	75.36	80,92	84.67	73.06
Gau-α	61.41	57,76	74.17	81.82	75.86	79.93	85,67	73.8
RoFormer-Pytorch (le code du référentiel)	60,60	57,51	74.44	80.79	75.67	86.84	84.77	74.37
ROFORGERV2-PYTORCH (le code du référentiel)	62.87	59.03	76.20	80.85	79.73	87.82	91.87	76.91
Gau-α-Pytorch (Adafactor)	61.18	57,52	73.42	80.91	75.69	80,59	85,5	73.54
GAU-α-PYTORCH (ADAMW WD0.01 WarmUp0.1)	60,68	57,95	73.08	81.02	75.36	81.25	83.93	73.32
ROFORDERV2-LARGE-PYTORCH (le code du référentiel)	61.75	59.21	76.14	82.35	81.73	91.45	91.5	77.73
Chinesebert-Gytorch	61.25	58,67	74.70	82.65	79.63	87.83	84.97	75.67
ROFORDERV2-base-paddle	63.76	59,53	77.06	81,58	81,56	87.83	86.73	76.87
Roformev2-gadle-paddle	64.02	60.08	77,92	82.87	83.9	92.43	86.87	78.30

Résultats de la tâche de classification de la liste Clue-1.0-Test, base + grande version.

	iflytek	tnews	afqmc	cmnli	ocnli	WSC	CSL	AVG
RoFormer-Pytorch (le code du référentiel)	59,54	57.34	74.46	80.23	73.67	80.69	84.57	72.93
ROFORGERV2-PYTORCH (le code du référentiel)	63.15	58.24	75.42	80,59	74.17	83,79	83,73	74.16
Gau-α-Pytorch (Adafactor)	61.38	57.08	74.05	80.37	73.53	74.83	85.6	72.41
GAU-α-PYTORCH (ADAMW WD0.01 WarmUp0.1)	60,54	57,67	72.44	80.32	72.97	76,55	84.13	72.09
ROFORDERV2-LARGE-PYTORCH (le code du référentiel)	61.85	59.13	76.38	80.97	76.23	85.86	84.33	74.96
Chinesebert-Gytorch	61,54	58,57	74.8	81.94	76.93	79.66	85.1	74.08
Roformev2-gadle-paddle	64.23	59,99	76.85	81.97	76.57	84.48	83.37	75.35

Note:

Parmi eux, ROFORDERV2 ^* représente le modèle ROFORVERV2 qui n'a pas été un apprentissage multi-tâches. Ce modèle n'est pas open source. Merci à Su Shen pour votre rappel.
Les résultats sans suffixe pytorch sont copiés à partir du référentiel Gau-alpha.
Les résultats avec le suffixe de pytorch sont tous obtenus en s'entraînant vous-même.
Le code Sushen classe directement la balise CLS après avoir pris la balise sushen. Ce référentiel utilise l'en-tête de classification suivant, avec 2 abandons, 1 densité et 1 activation RELU.
Le code de version Paddle est recherché sur la grille!

 class RoFormerClassificationHead ( nn . Module ):
    def __init__ ( self , config ):
        super (). __init__ ()
        self . dense = nn . Linear ( config . hidden_size , config . hidden_size )
        self . dropout = nn . Dropout ( config . hidden_dropout_prob )
        self . out_proj = nn . Linear ( config . hidden_size , config . num_labels )

        self . config = config

    def forward ( self , features , ** kwargs ):
        x = features [:, 0 , :]  # take <s> token (equiv. to [CLS])
        x = self . dropout ( x )
        x = self . dense ( x )
        x = ACT2FN [ self . config . hidden_act ]( x ) # 这里是relu
        x = self . dropout ( x )
        x = self . out_proj ( x )
        return x

Conseils:

Environnement expérimental RTX 3090

Capture d'écran Leadborad

Exemple de test roformer-sim

 import torch
import numpy as np
from roformer import RoFormerForCausalLM , RoFormerConfig
from transformers import BertTokenizer

device = torch . device ( 'cuda:0' if torch . cuda . is_available () else 'cpu' )
# 可选以下几个。
# junnyu/roformer_chinese_sim_char_small, junnyu/roformer_chinese_sim_char_base
# junnyu/roformer_chinese_sim_char_ft_small, roformer_chinese_sim_char_ft_base
pretrained_model = "junnyu/roformer_chinese_sim_char_base"
tokenizer = BertTokenizer . from_pretrained ( pretrained_model )
config = RoFormerConfig . from_pretrained ( pretrained_model )
config . is_decoder = True
config . eos_token_id = tokenizer . sep_token_id
config . pooler_activation = "linear"
model = RoFormerForCausalLM . from_pretrained ( pretrained_model , config = config )
model . to ( device )
model . eval ()

def gen_synonyms ( text , n = 100 , k = 20 ):
    ''''含义： 产生sent的n个相似句，然后返回最相似的k个。
    做法：用seq2seq生成，并用encoder算相似度并排序。
    '''
    # 寻找所有相似的句子
    r = []
    inputs1 = tokenizer ( text , return_tensors = "pt" )
    for _ in range ( n ):
        inputs1 . to ( device )
        output = tokenizer . batch_decode ( model . generate ( ** inputs1 , top_p = 0.95 , do_sample = True , max_length = 128 ), skip_special_tokens = True )[ 0 ]. replace ( " " , "" ). replace ( text , "" ) # 去除空格，去除原始text文本。
        r . append ( output )
    
    # 对相似的句子进行排序
    r = [ i for i in set ( r ) if i != text and len ( i ) > 0 ]
    r = [ text ] + r
    inputs2 = tokenizer ( r , padding = True , return_tensors = "pt" )
    with torch . no_grad ():
        inputs2 . to ( device )
        outputs = model ( ** inputs2 )
        Z = outputs . pooler_output . cpu (). numpy ()
    Z /= ( Z ** 2 ). sum ( axis = 1 , keepdims = True ) ** 0.5
    argsort = np . dot ( Z [ 1 :], - Z [ 0 ]). argsort ()
    
    return [ r [ i + 1 ] for i in argsort [: k ]]

out = gen_synonyms ( "广州和深圳哪个好？" )
print ( out )
# ['深圳和广州哪个好？',
#  '广州和深圳哪个好',
#  '深圳和广州哪个好',
#  '深圳和广州哪个比较好。',
#  '深圳和广州哪个最好？',
#  '深圳和广州哪个比较好',
#  '广州和深圳那个比较好',
#  '深圳和广州哪个更好？',
#  '深圳与广州哪个好',
#  '深圳和广州，哪个比较好',
#  '广州与深圳比较哪个好',
#  '深圳和广州哪里比较好',
#  '深圳还是广州比较好？',
#  '广州和深圳哪个地方好一些？',
#  '广州好还是深圳好？',
#  '广州好还是深圳好呢？',
#  '广州与深圳哪个地方好点？',
#  '深圳好还是广州好',
#  '广州好还是深圳好',
#  '广州和深圳哪个城市好？']

Tableau de comparaison de poids du modèle

Modèle chinois roformer-v2

huggingface.co	bert4keras
roformer_v2_chinese_char_small	chinois_roformer-v2-char_l-6_h-384_a-6.zip (code de téléchargement: TTN4)
roformer_v2_chinese_char_base	chinois_roformer-v2-char_l-12_h-768_a-12.zip (code de téléchargement: pfoh)
roformer_v2_chinese_char_large	chinois_roformer-v2-char_l-24_h-1024_a-16.zip (code de téléchargement: npfv)

Modèle chinois roformer-v1

huggingface.co	bert4keras
roformer_chinese_base	chinois_roformer_l-12_h-768_a-12.zip (code de téléchargement: xy9x)
roformer_chinese_small	chinois_roformer_l-6_h-384_a-6.zip (code de téléchargement: gy97)
roformer_chinese_char_base	chinois_roformer-char_l-12_h-768_a-12.zip (code de téléchargement: bt94)
roformer_chinese_char_small	chinois_roformer-char_l-6_h-384_a-6.zip (code de téléchargement: a44c)
roformer_chinese_sim_char_base	chinois_roformer-sim-char_l-12_h-768_a-12.zip (code de téléchargement: 2cgz)
roformer_chinese_sim_char_small	chinois_roformer-sim-char_l-6_h-384_a-6.zip (code de téléchargement: h68q)
roformer_chinese_sim_char_ft_base	chinois_roformer-sim-char-ff_l-12_h-768_a-12.zip (code de téléchargement: w15n)
roformer_chinese_sim_char_ft_small	chinois_roformer-sim-char-ff_l-6_h-384_a-6.zip (code de téléchargement: gty5)

Modèle anglais (petit modèle formé sur OpenWebText en utilisant la méthode de formation d'Electra (valeur rotative = true))

huggingface.co
roformer_small_generator
roformer_small_discriminateur

Test de MLM ROFORMER-V2

 import torch
import tensorflow as tf
from transformers import BertTokenizer
from roformer import RoFormerForMaskedLM , TFRoFormerForMaskedLM

text = "今天[MASK]很好，我[MASK]去公园玩。"
tokenizer = BertTokenizer . from_pretrained ( "junnyu/roformer_v2_chinese_char_base" )
pt_model = RoFormerForMaskedLM . from_pretrained ( "junnyu/roformer_v2_chinese_char_base" )
tf_model = TFRoFormerForMaskedLM . from_pretrained (
    "junnyu/roformer_v2_chinese_char_base" , from_pt = True
)
pt_inputs = tokenizer ( text , return_tensors = "pt" )
tf_inputs = tokenizer ( text , return_tensors = "tf" )
# pytorch
with torch . no_grad ():
    pt_outputs = pt_model ( ** pt_inputs ). logits [ 0 ]
pt_outputs_sentence = "pytorch: "
for i , id in enumerate ( tokenizer . encode ( text )):
    if id == tokenizer . mask_token_id :
        tokens = tokenizer . convert_ids_to_tokens ( pt_outputs [ i ]. topk ( k = 5 )[ 1 ])
        pt_outputs_sentence += "[" + "||" . join ( tokens ) + "]"
    else :
        pt_outputs_sentence += "" . join (
            tokenizer . convert_ids_to_tokens ([ id ], skip_special_tokens = True )
        )
print ( pt_outputs_sentence )
# tf
tf_outputs = tf_model ( ** tf_inputs , training = False ). logits [ 0 ]
tf_outputs_sentence = "tf: "
for i , id in enumerate ( tokenizer . encode ( text )):
    if id == tokenizer . mask_token_id :
        tokens = tokenizer . convert_ids_to_tokens ( tf . math . top_k ( tf_outputs [ i ], k = 5 )[ 1 ])
        tf_outputs_sentence += "[" + "||" . join ( tokens ) + "]"
    else :
        tf_outputs_sentence += "" . join (
            tokenizer . convert_ids_to_tokens ([ id ], skip_special_tokens = True )
        )
print ( tf_outputs_sentence )
# small
# pytorch: 今天[的||，||是||很||也]很好，我[要||会||是||想||在]去公园玩。
# tf: 今天[的||，||是||很||也]很好，我[要||会||是||想||在]去公园玩。
# base
# pytorch: 今天[我||天||晴||园||玩]很好，我[想||要||会||就||带]去公园玩。
# tf: 今天[我||天||晴||园||玩]很好，我[想||要||会||就||带]去公园玩。
# large
# pytorch: 今天[天||气||我||空||阳]很好，我[又||想||会||就||爱]去公园玩。
# tf: 今天[天||气||我||空||阳]很好，我[又||想||会||就||爱]去公园玩。

Test de MLM ROFORMER-V1

 import torch
import tensorflow as tf
from transformers import RoFormerForMaskedLM , RoFormerTokenizer , TFRoFormerForMaskedLM

text = "今天[MASK]很好，我[MASK]去公园玩。"
tokenizer = RoFormerTokenizer . from_pretrained ( "junnyu/roformer_chinese_base" )
pt_model = RoFormerForMaskedLM . from_pretrained ( "junnyu/roformer_chinese_base" )
tf_model = TFRoFormerForMaskedLM . from_pretrained (
    "junnyu/roformer_chinese_base" , from_pt = True
)
pt_inputs = tokenizer ( text , return_tensors = "pt" )
tf_inputs = tokenizer ( text , return_tensors = "tf" )
# pytorch
with torch . no_grad ():
    pt_outputs = pt_model ( ** pt_inputs ). logits [ 0 ]
pt_outputs_sentence = "pytorch: "
for i , id in enumerate ( tokenizer . encode ( text )):
    if id == tokenizer . mask_token_id :
        tokens = tokenizer . convert_ids_to_tokens ( pt_outputs [ i ]. topk ( k = 5 )[ 1 ])
        pt_outputs_sentence += "[" + "||" . join ( tokens ) + "]"
    else :
        pt_outputs_sentence += "" . join (
            tokenizer . convert_ids_to_tokens ([ id ], skip_special_tokens = True )
        )
print ( pt_outputs_sentence )
# tf
tf_outputs = tf_model ( ** tf_inputs , training = False ). logits [ 0 ]
tf_outputs_sentence = "tf: "
for i , id in enumerate ( tokenizer . encode ( text )):
    if id == tokenizer . mask_token_id :
        tokens = tokenizer . convert_ids_to_tokens ( tf . math . top_k ( tf_outputs [ i ], k = 5 )[ 1 ])
        tf_outputs_sentence += "[" + "||" . join ( tokens ) + "]"
    else :
        tf_outputs_sentence += "" . join (
            tokenizer . convert_ids_to_tokens ([ id ], skip_special_tokens = True )
        )
print ( tf_outputs_sentence )
# pytorch: 今天[天气||天||心情||阳光||空气]很好，我[想||要||打算||准备||喜欢]去公园玩。
# tf:      今天[天气||天||心情||阳光||空气]很好，我[想||要||打算||准备||喜欢]去公园玩。

Conversion de poids manuelle

python convert_roformer_original_tf_checkpoint_to_pytorch.py 
    --tf_checkpoint_path=xxxxxx/chinese_roformer_L-12_H-768_A-12/bert_model.ckpt 
    --bert_config_file=pretrained_models/chinese_roformer_base/config.json 
    --pytorch_dump_path=pretrained_models/chinese_roformer_base/pytorch_model.bin

TF est aligné sur la précision Pytorch

 small版本
bert4keras vs pytorch
mean diff : tensor ( 5.9108e-07 )
max diff : tensor ( 5.7220e-06 )
bert4keras vs tf2 .0
mean diff : tensor ( 4.5976e-07 )
max diff : tensor ( 3.5763e-06 )

base版本
python compare_model . py
bert4keras vs pytorch
mean diff : tensor ( 4.3340e-07 )
max diff : tensor ( 5.7220e-06 )
bert4keras vs tf2 .0
mean diff : tensor ( 3.4319e-07 )
max diff : tensor ( 5.2452e-06 )

se référer à

https://github.com/Pengming617/bert_classification

https://github.com/bojone/bert4keras

https://github.com/zhuiyitechnology/roformer

https://github.com/lonepatient/nezha_chinese_pytorch

https://github.com/lonepatient/torchblocks

https://github.com/huggingface/transformers

Citation

Bibtex:

@misc{su2021roformer,
      title={RoFormer: Enhanced Transformer with Rotary Position Embedding}, 
      author={Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
      year={2021},
      eprint={2104.09864},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@techreport{roformerv2,
  title={RoFormerV2: A Faster and Better RoFormer - ZhuiyiAI},
  author={Jianlin Su, Shengfeng Pan, Bo Wen, Yunfeng Liu},
  year={2022},
  url="https://github.com/ZhuiyiTechnology/roformer-v2",
}

Développer