coherence boosting下载 - coherence boosting源代码下载

coherence boosting

Ai源码

1.0.0

下载

相干提升（ACL 2022）

ACL 2022论文的源代码“连贯性提升：当您验证的语言模型不引起足够的注意时”（Arxiv，ACL选集）

1。简介
2。引用
3。演示：对比旁边的标记预测
4。兰巴达：需要长篇小说的单词的预测
5。自然语言理解
- 5.1。将相干提升应用于您自己的多项选择数据集
6。自然语言产生
7。联系

1。简介

远程语义连贯性仍然是自动语言产生和理解的挑战。我们证明，大型语言模型不足以了解遥远的单词对下一句话预测的影响。我们提出了连贯性的提升，这是一种推理过程，它增加了LM对漫长背景的关注。我们通过对普通文本和对话响应的分配分析来展示与验证模型相干提高的好处。还发现，针对各种零射门NLP任务的最先进模型的连贯性提高可带来性能提高，而没有额外的培训。

2。引用，

如果您发现纸张和代码很有用，请稍好播放此仓库，并引用纸张。非常感谢！

@inproceedings { malkin-etal-2022-coherence , title = " Coherence boosting: When your pretrained language model is not paying enough attention " , author = " Malkin, Nikolay and Wang, Zhen and Jojic, Nebojsa " , booktitle = " Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) " , month = may, year = " 2022 " , address = " Dublin, Ireland " , publisher = " Association for Computational Linguistics " , url = " https://aclanthology.org/2022.acl-long.565 " , doi = " 10.18653/v1/2022.acl-long.565 " , pages = " 8214--8236 " }

3。演示：对比度下一个标记预测，

我们提出了一个演示程序，以证明对现有的预训练的LMS缺乏连贯性，即，未能在给定的上下文中固定预测下一个标记，这显然需要对遥远的单词的理解。我们提出的连贯性提升可以解决此类错误，这相反，通过对数线性将两个分布从完整的上下文和部分上下文中进行对比来预测下一步的令牌。

> >> from cb_demo import contrasting > >> contrasting ( model_name = 'gpt2' , context = ' Ballad metre is "less regular and more conversational" than common metre' , - - partial_length = 8 , - - alpha = 0.5 ) [ out ] Top tokens based on full context : Ballad metre is "less regular and more conversational" than common Rank Tokens Logprobs Probs - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 Ġsense - 2.405 9.03 % 2 Ġin - 3.900 2.02 % 3 . - 3.978 1.87 % 4 , - 4.097 1.66 % 5 Ġpractice - 4.287 1.37 % ... ... ... ... 13 Ġmetre ** - 5.098 0.610609 % * * Target Token Top tokens based on partial context : regular and more conversational " than common Rank Tokens Logprobs Probs - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 Ġsense - 2.547 7.83 % 2 ĠEnglish - 3.352 3.50 % 3 . - 3.427 3.25 % 4 Ġconversation - 3.445 3.19 % 5 , - 3.634 2.64 % ... ... ... ... 14103 Ġmetre ** - 13.450 0.000144 % * * Target Token Contrastive next token prediction : Rank Tokens Logprobs Probs - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 Ġmetre ** - 0.923 39.74 % 2 Ġsense - 2.334 9.69 % 3 Ġmeter - 2.785 6.17 % 4 Ġin - 3.210 4.03 % 5 Ġfoot - 3.220 3.99 % * * Target Token

您可以通过以下代码（

python cb_demo . py - - context = ' Ballad metre is "less regular and more conversational" than common metre' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5 python cb_demo . py - - context = ' Isley Brewing Company: Going Mintal — a minty milk chocolate stout' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5 python cb_demo . py - - context = ' Other times anxiety is not as easy to see, but can still be just as debilitating' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5

复制本文图1中的某些示例的结果

。

python cb_demo . py - - context = ' Ballad metre is "less regular and more conversational" than common metre' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5 python cb_demo . py - - context = ' Isley Brewing Company: Going Mintal — a minty milk chocolate stout' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5 python cb_demo . py - - context = ' Other times anxiety is not as easy to see, but can still be just as debilitating' - - model_name = 'gpt2' - - partial_length = 8 - - alpha = 0.5

4. lambada：需要长上下文

lambada任务的单词的预测类似于上面显示的示例，其中预期模型可以预测几句句子的段落中的最终单词。该数据集是评估现代Langauge模型的标准基准（示例）。

更重要的是，此任务明确需要在广泛的上下文中进行推理：当给出整个段落时，人类可以可靠地猜测最后一个单词，而仅当只给出最后一句话时就可以可靠地猜测。这样的属性使该基准成为完美的测试台，以评估我们提出的连贯性提升的有效性。

要运行Lambada实验，只需运行以下命令：

python main . py - - tasks = 'lambada' - - models = 'gpt2-small' - - use_val = False - - alpha_start = 1 - - alpha_end = 1 - - alpha_step = 0.1 - - slen_start = 10 - - slen_end = 10

重要的参数，请列出list secks to list，请列出python main.py --help 。

--models ：预先训练的语言模型的名称，如果您想同时运行多个模型，例如，例如， 'gpt2-small;gpt2-medium' ；如果要使用GPT-3型号，请参见有关GPT-3的注释。
--use_val : Whether to use a validation set to select two hyperparameters, alpha and slen representing the boosting coefficient and length for the partial context
--alpha_start , --alpha_end , --alpha_step : Grid search parameters for the alpha hyperparameter
--slen_start , --slen_end , --slen_step : Grid search parameters for the slen超参数；请注意，两个超参数设置都会影响Lambada任务的推理速度

5.自然语言理解，

我们评估了以下NLU任务的提高一致性的提升。

集

任务	关闭任务	问题回答	文本分类	NLI	事实知识检索
数据	Storycloze Hellaswag Copa	Commonsenseqa OpenBookQa 弧易/挑战 PIQA	SST-2/5 trec Agnews	Rte CB 布尔克	喇嘛

大多数数据集可以由HugginFace的数据集加载；他们中只有少数需要手动下载，并在运行代码时提示说明。

要运行NLU实验，只需运行以下命令：

python main . py - - tasks = 'storycloze;csqa;openbookqa' - - models = 'gpt2-small;gpt2-medium;gpt2-large' - - alpha_start = 2 - - alpha_end = - 3 - - alpha_step = 0.01

列出了一些重要的python main.py --help ，请列出某些重要参数。

--models ：预先训练的语言模型的名称，如果要同时运行多个模型，例如，例如'gpt2-small;gpt2-medium'
--use_val ：是否使用验证设置来选择两个超级公寓， alpha和slen代表增强系数和长度为
--alpha_end --alpha_start --alpha_step ： alpha超参数的网格搜索参数；请注意，代码缓存了中间结果，并且在运行一次

有关GPT-3的推断

后，您可以自由更改这些参数

，如果您想运行GPT-3型号，请将API键放入名为api_key.txt的api_key.txt中的
gpt-3结果，我们论文中基于GPT-3系列的api_key.txt与最新的GPT-3系列相比，与最新的GPT-3系列相比，它可能具有略有不同的GPT-3系列。

5.1。

除了先前的任务外，

将连贯性提升应用于您自己的多项选择数据集外

，我们的代码库还足够灵活，可以将任何新的多选择数据集纳入很小的努力（灵感来自开源项目LM-Evaluation-Harness）。大约三个步骤：

在__init__.py中注册tasks文件夹中的新数据集。
创建一个新的类，以数据预处理功能（例如， load_data ， standardize ）继承MultipleChoiceTask类别类
是get_contrast_ctx ，这是您定义自己的前提提示的地方，以启动

其他任务类别，并在采用我们的代码时，请让我们自由地知道其他任务类别，并在采用任何问题。

6.自然语言生成，

我们提供了一代模型包装器与HuggingFace transformers库兼容的generation/generation.py 。您可以使用示例脚本中的类创建任何自回归LM的相干增强变体：

> >> boosted_model = generation . BoostedModel ( base_model , k = 8 , alpha_long = 1.5 , alpha_short = - 0.5 )

然后可以在生成函数，

> >> ins = T . LongTensor ([ tokenizer . encode ( 'Once upon a midnight dreary,' )]) > >> outputs = boosted_model . generate ( input_ids = ins , do_sample = True , max_length = 100 , top_p = 0.95 ) > >> tokenizer . decode ( outputs [ 0 ]) "Once upon a midnight dreary, while I pondered over these things, I suddenly became aware of a strange and terrible noise. I turned round, and saw that the old man was standing near me. He was wearing a black suit, with a black tie, and a black hat. He had a long, thin, black beard, and his eyes were black. His hair was of a dark brown colour, and was very long. His face was rather large, and his lips were somewhat"

generate灵活地使用boosted_model

。

> >> ins = T . LongTensor ([ tokenizer . encode ( 'Once upon a midnight dreary,' )]) > >> outputs = boosted_model . generate ( input_ids = ins , do_sample = True , max_length = 100 , top_p = 0.95 ) > >> tokenizer . decode ( outputs [ 0 ]) "Once upon a midnight dreary, while I pondered over these things, I suddenly became aware of a strange and terrible noise. I turned round, and saw that the old man was standing near me. He was wearing a black suit, with a black tie, and a black hat. He had a long, thin, black beard, and his eyes were black. His hair was of a dark brown colour, and was very long. His face was rather large, and his lips were somewhat"