這是本文中使用的代碼的存儲庫:
請引用紙張為:
@article { din2023jump ,
title = { Jump to Conclusions: Short-Cutting Transformers With Linear Transformations } ,
author = { Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor } ,
journal = { arXiv preprint arXiv:2303.09435 } ,
year = { 2023 } ,
}要生產gpt2和Wikipedia句子的地塊,請按書面順序運行以下內容:
get_wikipedia_sentences.py
(生產./experiment/sentences/wikipedia_20K-sentences.pickle sentences/wikipedia_20k-sentences.pickle,包含來自Wikipedia的20K句子)
add_tokenization.py
(produces ./experiment/gpt2/wikipedia_tokenized_train.pickle containing the tokenizations and random token positions for the first 9000 sentences from the file produced by the previous script, and ./experiment/gpt2/wikipedia_tokenized_val.pickle containing the tokenizations and random token positions for the next 3000 sentences)
add_linreg.py
(生產./linreg/gpt2/wikipedia/i_j.pickle在其中
add_plot_r2.py
(生產./experiment/gpt2/wikipedia_r2_scores.pickle gpt2/wikipedia_r2_scores.pickle包含./experiments/gpt2/plots/wikipedia/r2_scores_12.pdf gpt2/plots/wikipedia/r2_scores_12.pdf包含熱圖圖
add_linreg_submodules.py
(生產./linreg/gpt2/wikipedia/pi_a_b.pickle在其中
add_results.py
(生產./experiment/gpt2/wikipedia_results.pickle gpt2/wikipedia_results.pickle,其中包含(對於每個驗證集示例)前10個令牌,以及該模型的前1個令牌的驚人,根據每一層的五個映射,在每一層的五個映射中,還包含在早期和使用早期效果的前10個標記和使用時,也包含了早期的層次數量。
plot_results.py
(根據上一個文件的輸出中的結果,在./experiment/gpt2/plots/wikipedia/ gpt2/plots/wikipedia/中產生一些圖)
要生成bert-base-uncased和Wikipedia句子的地塊,請按書面順序運行以下內容:
get_wikipedia_sentences.py
(與上面的gpt2相同,無需重新運行)
bert_add_reps.py
(produces ./experiment/bert-base-uncased_mask/wikipedia_train.pickle containing the tokenizations, random token positions and representations of the masked random token at all layers for the first 9000 sentences from the file produced by the previous script, and ./experiment/bert-base-uncased_mask/wikipedia_val.pickle containing the tokenizations, random在接下來的3000個句子中,所有圖層的蒙版隨機令牌的令牌位置和表示)
bert_add_linreg.py
(生產./linreg/bert-base-uncased_mask/wikipedia/i_j.pickle在其中
bert_add_plot_r2.py
(生產./experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle Experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle包含./experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf Experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf包含這些熱圖圖
bert_add_results.py
(生產./experiment/bert-base-uncased_mask/wikipedia_results.pickle bert-base-uncased_mask/wikipedia_results.pickle(對於每個驗證集示例),前10個令牌以及模型的最佳1個令牌的驚喜
plot_results.py(更改model_folder_name='bert-base-uncased_mask'和plot_parts = False )
(根據上一個文件的輸出中的結果,在./experiment/bert-base-uncased_mask/plots/wikipedia/ bert-base-uncased_mask/plots/wikipedia/中產生一些圖)
我們還製作了gpt2-medium中的gpt2-large , gpt2-xl , bert-large-uncased 。為此,應該以相對層次的方式修改序列中每個腳本的頭部的變量。
該代碼使用Python 3.10.4和以下軟件包版本運行:
torch.__version__ = 1.13.1+cu117
transformers.__version__ = 4.20.1
sklearn.__version__ = 1.2.0
pickle.format_version = 4.0
datasets.__version__ = 2.5.2 # used only to fetch Wikipedia sentences
spacy.__version__ = 3.5.0 # used only to fetch Wikipedia sentences
可以在https://huggingface.co/sashay/linear-shortcut上找到一些訓練有素的矩陣。