这是本文中使用的代码的存储库:
请引用纸张为:
@article { din2023jump ,
title = { Jump to Conclusions: Short-Cutting Transformers With Linear Transformations } ,
author = { Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor } ,
journal = { arXiv preprint arXiv:2303.09435 } ,
year = { 2023 } ,
}要生产gpt2和Wikipedia句子的地块,请按书面顺序运行以下内容:
get_wikipedia_sentences.py
(生产./experiment/sentences/wikipedia_20K-sentences.pickle sentences/wikipedia_20k-sentences.pickle,包含来自Wikipedia的20K句子)
add_tokenization.py
(produces ./experiment/gpt2/wikipedia_tokenized_train.pickle containing the tokenizations and random token positions for the first 9000 sentences from the file produced by the previous script, and ./experiment/gpt2/wikipedia_tokenized_val.pickle containing the tokenizations and random token positions for the next 3000 sentences)
add_linreg.py
(生产./linreg/gpt2/wikipedia/i_j.pickle在其中
add_plot_r2.py
(生产./experiment/gpt2/wikipedia_r2_scores.pickle gpt2/wikipedia_r2_scores.pickle包含./experiments/gpt2/plots/wikipedia/r2_scores_12.pdf gpt2/plots/wikipedia/r2_scores_12.pdf包含热图图
add_linreg_submodules.py
(生产./linreg/gpt2/wikipedia/pi_a_b.pickle在其中
add_results.py
(生产./experiment/gpt2/wikipedia_results.pickle gpt2/wikipedia_results.pickle,其中包含(对于每个验证集示例)前10个令牌,以及该模型的前1个令牌的惊人,根据每一层的五个映射,在每一层的五个映射中,还包含在早期和使用早期效果的前10个标记和使用时,也包含了早期的层次数量。
plot_results.py
(根据上一个文件的输出中的结果,在./experiment/gpt2/plots/wikipedia/ gpt2/plots/wikipedia/中产生一些图)
要生成bert-base-uncased和Wikipedia句子的地块,请按书面顺序运行以下内容:
get_wikipedia_sentences.py
(与上面的gpt2相同,无需重新运行)
bert_add_reps.py
(produces ./experiment/bert-base-uncased_mask/wikipedia_train.pickle containing the tokenizations, random token positions and representations of the masked random token at all layers for the first 9000 sentences from the file produced by the previous script, and ./experiment/bert-base-uncased_mask/wikipedia_val.pickle containing the tokenizations, random在接下来的3000个句子中,所有图层的蒙版随机令牌的令牌位置和表示)
bert_add_linreg.py
(生产./linreg/bert-base-uncased_mask/wikipedia/i_j.pickle在其中
bert_add_plot_r2.py
(生产./experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle Experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle包含./experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf Experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf包含这些热图图
bert_add_results.py
(生产./experiment/bert-base-uncased_mask/wikipedia_results.pickle bert-base-uncased_mask/wikipedia_results.pickle(对于每个验证集示例),前10个令牌以及模型的最佳1个令牌的惊喜
plot_results.py(更改model_folder_name='bert-base-uncased_mask'和plot_parts = False )
(根据上一个文件的输出中的结果,在./experiment/bert-base-uncased_mask/plots/wikipedia/ bert-base-uncased_mask/plots/wikipedia/中产生一些图)
我们还制作了gpt2-medium中的gpt2-large , gpt2-xl , bert-large-uncased 。为此,应该以相对层次的方式修改序列中每个脚本的头部的变量。
该代码使用Python 3.10.4和以下软件包版本运行:
torch.__version__ = 1.13.1+cu117
transformers.__version__ = 4.20.1
sklearn.__version__ = 1.2.0
pickle.format_version = 4.0
datasets.__version__ = 2.5.2 # used only to fetch Wikipedia sentences
spacy.__version__ = 3.5.0 # used only to fetch Wikipedia sentences
可以在https://huggingface.co/sashay/linear-shortcut上找到一些训练有素的矩阵。