mat下载 - mat源代码下载

mat

Ai源码

1.0.0

下载

得出结论：带线性变换的近切型变压器

这是本文中使用的代码的存储库：

Alexander Yom Din，Taelin Karidi，Leshem Choshen，Mor Geva。 2023。跳到结论：具有线性变换的短切换变压器。（Arxiv：2303.09435）

请引用纸张为：

 @article { din2023jump ,
      title = { Jump to Conclusions: Short-Cutting Transformers With Linear Transformations } ,
      author = { Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor } ,
      journal = { arXiv preprint arXiv:2303.09435 } ,
      year = { 2023 } ,
}

运行代码

要生产gpt2和Wikipedia句子的地块，请按书面顺序运行以下内容：

get_wikipedia_sentences.py
（生产./experiment/sentences/wikipedia_20K-sentences.pickle sentences/wikipedia_20k-sentences.pickle，包含来自Wikipedia的20K句子）
add_tokenization.py
(produces ./experiment/gpt2/wikipedia_tokenized_train.pickle containing the tokenizations and random token positions for the first 9000 sentences from the file produced by the previous script, and ./experiment/gpt2/wikipedia_tokenized_val.pickle containing the tokenizations and random token positions for the next 3000 sentences)
add_linreg.py
（生产./linreg/gpt2/wikipedia/i_j.pickle在其中 $ 0 leq i＆lt; J leq 12 $ ，包含矩阵 $ a_ {j，i} $ （作为Torch.tensor）用于从层跳过 $ i $分层 $ J $ ）
add_plot_r2.py
（生产./experiment/gpt2/wikipedia_r2_scores.pickle gpt2/wikipedia_r2_scores.pickle包含 $ r^2 $分数 $ texttt {mat} $和 $ texttt {id} $ ，还产生./experiments/gpt2/plots/wikipedia/r2_scores_12.pdf gpt2/plots/wikipedia/r2_scores_12.pdf包含热图图 $ r^2 $得分）
add_linreg_submodules.py
（生产./linreg/gpt2/wikipedia/pi_a_b.pickle在其中 $ 0 leq i＆lt; 12 $和 $ 0 leq a＆lt; 6 $和 $ b = a + 1 $ ;这些包含用于线性近似Transformer Block中亚模块的输出的矩阵（作为火炬。 $ i+1 $鉴于其输入。 $ b = 1 $对应于第一层归一化， $ b = 2 $对应于注意力， $ b = 3 $对应于第一个残留连接， $ b = 4 $与第二层归一化的相关性， $ b = 5 $对应于馈送网络层和 $ b = 6 $对应于第二个残留连接）
add_results.py
（生产./experiment/gpt2/wikipedia_results.pickle gpt2/wikipedia_results.pickle，其中包含（对于每个验证集示例）前10个令牌，以及该模型的前1个令牌的惊人，根据每一层的五个映射，在每一层的五个映射中，还包含在早期和使用早期效果的前10个标记和使用时，也包含了早期的层次数量。 $ texttt {mat} $和 $ texttt {id} $ （对于各种值 $ lambda $ ））
plot_results.py
（根据上一个文件的输出中的结果，在./experiment/gpt2/plots/wikipedia/ gpt2/plots/wikipedia/中产生一些图）

要生成bert-base-uncased和Wikipedia句子的地块，请按书面顺序运行以下内容：

get_wikipedia_sentences.py
（与上面的gpt2相同，无需重新运行）
bert_add_reps.py
(produces ./experiment/bert-base-uncased_mask/wikipedia_train.pickle containing the tokenizations, random token positions and representations of the masked random token at all layers for the first 9000 sentences from the file produced by the previous script, and ./experiment/bert-base-uncased_mask/wikipedia_val.pickle containing the tokenizations, random在接下来的3000个句子中，所有图层的蒙版随机令牌的令牌位置和表示）
bert_add_linreg.py
（生产./linreg/bert-base-uncased_mask/wikipedia/i_j.pickle在其中 $ 0 leq i＆lt; J leq 12 $ ，包含矩阵 $ a_ {j，i} $ （作为Torch.tensor）用于从层跳过 $ i $分层 $ J $ ）
bert_add_plot_r2.py
（生产./experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle Experiment/bert-base-uncased_mask/wikipedia_r2_scores.pickle包含 $ r^2 $分数 $ texttt {mat} $和 $ texttt {id} $ ，还产生./experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf Experiments/bert-base-uncased_mask/plots/wikipedia/r2_scores_12.pdf包含这些热图图 $ r^2 $得分）
bert_add_results.py
（生产./experiment/bert-base-uncased_mask/wikipedia_results.pickle bert-base-uncased_mask/wikipedia_results.pickle（对于每个验证集示例），前10个令牌以及模型的最佳1个令牌的惊喜 $ texttt {mat} $和 $ texttt {id} $ ，在每一层；还包含早期外观和使用映射时处理的前10个令牌和加工的图层数量 $ texttt {mat} $和 $ texttt {id} $ （对于各种值 $ lambda $ ））
plot_results.py（更改model_folder_name='bert-base-uncased_mask'和plot_parts = False ）
（根据上一个文件的输出中的结果，在./experiment/bert-base-uncased_mask/plots/wikipedia/ bert-base-uncased_mask/plots/wikipedia/中产生一些图）

我们还制作了gpt2-medium中的gpt2-large ， gpt2-xl ， bert-large-uncased 。为此，应该以相对层次的方式修改序列中每个脚本的头部的变量。

要求

该代码使用Python 3.10.4和以下软件包版本运行：

 torch.__version__ = 1.13.1+cu117
transformers.__version__ = 4.20.1
sklearn.__version__ = 1.2.0
pickle.format_version = 4.0
datasets.__version__ = 2.5.2  # used only to fetch Wikipedia sentences
spacy.__version__ = 3.5.0  # used only to fetch Wikipedia sentences