mat下載 - mat源代碼下載

mat

Ai源碼

1.0.0

下載

得出結論：帶線性變換的近切型變壓器

這是本文中使用的代碼的存儲庫：

Alexander Yom Din，Taelin Karidi，Leshem Choshen，Mor Geva。 2023。跳到結論：具有線性變換的短切換變壓器。（Arxiv：2303.09435）

請引用紙張為：

 @article { din2023jump ,
      title = { Jump to Conclusions: Short-Cutting Transformers With Linear Transformations } ,
      author = { Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor } ,
      journal = { arXiv preprint arXiv:2303.09435 } ,
      year = { 2023 } ,
}

運行代碼

要生產gpt2和Wikipedia句子的地塊，請按書面順序運行以下內容：

get_wikipedia_sentences.py
（生產./experiment/sentences/wikipedia_20K-sentences.pickle sentences/wikipedia_20k-sentences.pickle，包含來自Wikipedia的20K句子）
add_tokenization.py
(produces ./experiment/gpt2/wikipedia_tokenized_train.pickle containing the tokenizations and random token positions for the first 9000 sentences from the file produced by the previous script, and ./experiment/gpt2/wikipedia_tokenized_val.pickle containing the tokenizations and random token positions for the next 3000 sentences)
add_linreg.py
（生產./linreg/gpt2/wikipedia/i_j.pickle在其中 $ 0 leq i＆lt; J leq 12 $ ，包含矩陣 $ a_ {j，i} $ （作為Torch.tensor）用於從層跳過 $ i $分層 $ J $ ）
add_plot_r2.py
（生產./experiment/gpt2/wikipedia_r2_scores.pickle gpt2/wikipedia_r2_scores.pickle包含 $ r^2 $分數 $ texttt {mat} $和 $ texttt {id} $ ，還產生./experiments/gpt2/plots/wikipedia/r2_scores_12.pdf gpt2/plots/wikipedia/r2_scores_12.pdf包含熱圖圖 $ r^2 $得分）
add_linreg_submodules.py
（生產./linreg/gpt2/wikipedia/pi_a_b.pickle在其中 $ 0 leq i＆lt; 12 $和 $ 0 leq a＆lt; 6 $和 $ b = a + 1 $ ;這些包含用於線性近似Transformer Block中亞模塊的輸出的矩陣（作為火炬。 $ i+1 $鑑於其輸入。 $ b = 1 $對應於第一層歸一化， $ b = 2 $對應於注意力， $ b = 3 $對應於第一個殘留連接， $ b = 4 $與第二層歸一化的相關性， $ b = 5 $對應於饋送網絡層和 $ b = 6 $對應於第二個殘留連接）
add_results.py
（生產./experiment/gpt2/wikipedia_results.pickle gpt2/wikipedia_results.pickle，其中包含（對於每個驗證集示例）前10個令牌，以及該模型的前1個令牌的驚人，根據每一層的五個映射，在每一層的五個映射中，還包含在早期和使用早期效果的前10個標記和使用時，也包含了早期的層次數量。 $ texttt {mat} $和 $ texttt {id} $ （對於各種值 $ lambda $ ））
plot_results.py
（根據上一個文件的輸出中的結果，在./experiment/gpt2/plots/wikipedia/ gpt2/plots/wikipedia/中產生一些圖）

要生成bert-base-uncased和Wikipedia句子的地塊，請按書面順序運行以下內容：

get_wikipedia_sentences.py
（與上面的gpt2相同，無需重新運行）
bert_add_reps.py
(produces ./experiment/bert-base-uncased_mask/wikipedia_train.pickle containing the tokenizations, random token positions and representations of the masked random token at all layers for the first 9000 sentences from the file produced by the previous script, and ./experiment/bert-base-uncased_mask/wikipedia_val.pickle containing the tokenizations, random在接下來的3000個句子中，所有圖層的蒙版隨機令牌的令牌位置和表示）
bert_add_linreg.py
（生產./linreg/bert-base-uncased_mask/wikipedia/i_j.pickle在其中 $ 0 leq i＆lt; J leq 12 $ ，包含矩陣 $ a_ {j，i} $ （作為Torch.tensor）用於從層跳過 $ i $分層 $ J $ ）
bert_add_plot_r2.py
（生產./experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle Experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle包含 $ r^2 $分數 $ texttt {mat} $和 $ texttt {id} $ ，還產生./experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf Experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf包含這些熱圖圖 $ r^2 $得分）
bert_add_results.py
（生產./experiment/bert-base-uncased_mask/wikipedia_results.pickle bert-base-uncased_mask/wikipedia_results.pickle（對於每個驗證集示例），前10個令牌以及模型的最佳1個令牌的驚喜 $ texttt {mat} $和 $ texttt {id} $ ，在每一層；還包含早期外觀和使用映射時處理的前10個令牌和加工的圖層數量 $ texttt {mat} $和 $ texttt {id} $ （對於各種值 $ lambda $ ））
plot_results.py（更改model_folder_name='bert-base-uncased_mask'和plot_parts = False ）
（根據上一個文件的輸出中的結果，在./experiment/bert-base-uncased_mask/plots/wikipedia/ bert-base-uncased_mask/plots/wikipedia/中產生一些圖）

我們還製作了gpt2-medium中的gpt2-large ， gpt2-xl ， bert-large-uncased 。為此，應該以相對層次的方式修改序列中每個腳本的頭部的變量。

要求

該代碼使用Python 3.10.4和以下軟件包版本運行：

 torch.__version__ = 1.13.1+cu117
transformers.__version__ = 4.20.1
sklearn.__version__ = 1.2.0
pickle.format_version = 4.0
datasets.__version__ = 2.5.2  # used only to fetch Wikipedia sentences
spacy.__version__ = 3.5.0  # used only to fetch Wikipedia sentences