CEBaB下载CEBaB源代码下载

CEBaB

Ai源码

1.0.0

下载

Cebab：估计现实世界概念对NLP模型行为的因果影响

什么是木炭？

✅评估因果解释方法的英语基准。
✅基于人类验证的基于方面的情感分析（ABSA）基准。

内容

引用
数据集文件
数据表
快速开始
数据格式
代码
执照

引用

Eldar David Abraham，Karel D'Oosterlink，Amir Feder，Yair Gat，Atticus Geiger，Christopher Potts，Roi Reichart，Zhengxuan Wu。 2022。Cebab：估计现实世界概念对NLP模型行为的因果影响。斯坦福大学技术学院 - 以色列理工学院和根特大学。

 @unpublished{abraham-etal-2022-cebab,
    title={{CEBaB}: Estimating the Causal Effects of Real-World Concepts on {NLP} Model Behavior},
    author={Abraham, Eldar David and D'Oosterlinck, Karel and Feder, Amir and Gat, Yair Ori and Geiger, Atticus and Potts, Christopher and Reichart, Roi and Wu, Zhengxuan},
    note={arXiv:2205.14140},
    url={https://arxiv.org/abs/2205.14140},
    year={2022}}

数据集文件

数据集文件可以从cebab-v1.1.zip下载。我们的v1.1与v1.0不同，仅在v1.1中，v1.1具有适当的唯一ID，我们的示例并纠正了导致以前版本中某些非唯一ID的错误。其他关键领域没有变化。

请注意，我们建议您使用HuggingFace数据集库使用我们的数据集。有关1线性数据加载，请参见下文。

该数据集由Train_exclusive/train_clusive/dev/test拆分组成：

train_exclusive.json
train_inclusive.json
train_observational.json
dev.json
test.json

数据表

我们数据集的数据表：

cebab_datasheet.md

快速开始

拥抱面（推荐）

Cebab主要使用HuggingFace数据集库维护：

 """
Make sure you install the Datasets library using:
pip install datasets
"""
from datasets import load_dataset

CEBaB = load_dataset ( "CEBaB/CEBaB" )

本地文件（不建议）

此功能可用于加载原始*.json文件的任何子集：

 import json

def load_split ( splitname ):
    with open ( splitname ) as f :
        data = json . load ( f )
    return data

数据格式

{    
    'id' : str in format dddddd_dddddd as the concatenation of original_id and edit_id ,
    'original_id' : str in format dddddd ,
    'edit_id' : str in format dddddd ,
    'is_original' : bool ,
    'edit_goal' : str ( one of "Negative" , "Positive" , "unknown" ) or None if is_original ,
    'edit_type' : str ( one of "noise" , "service" , "ambiance" , "food" ),
    'edit_worker' : str or None if is_original ,
    'description' : str ,
    'review_majority' : str ( one of "1" , "2" , "3" , "4" , "5" , "no majority" ),
    'review_label_distribution' : dict ( str to int ),
    'review_workers' : dict ( str to str ),
    'food_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
    'ambiance_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
    'service_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
    'noise_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
    'food_aspect_label_distribution' : dict ( str to int ),
    'ambiance_aspect_label_distribution' : dict ( str to int ),
    'service_aspect_label_distribution' : dict ( str to int ),
    'noise_aspect_label_distribution' : dict ( str to int ),
    'food_aspect_validation_workers' : dict ( str to str ),
    'ambiance_aspect_validation_workers' : dict ( str to str ),
    'service_aspect_validation_workers' : dict ( str to str ),
    'noise_aspect_validation_workers' : dict ( str to str ),
    'opentable_metadata' : {
        "restaurant_id" : int ,
        "restaurant_name" : str ,
        "cuisine" : str ,
        "price_tier" : str ,
        "dining_style" : str ,
        "dress_code" : str ,
        "parking" : str ,
        "region" : str ,
        "rating_ambiance" : int ,
        "rating_food" : int ,
        "rating_noise" : int ,
        "rating_service" : int ,
        "rating_overall" : int
    }
}

细节：

'id' ：唯一标识符此示例（下面列出的两个ID的组合）。
'original_id' ：编辑示例的原始句子的唯一标识符。
'edit_id' ：编辑句子的唯一标识符。
'is_original' ：指示该句子是否是编辑。
'edit_goal' ：编辑方面的目标标签，如果它是编辑的示例，则None 。
'edit_type' ：如果是编辑的示例，则可以修改或用情感标记的方面，否则None 。
'edit_worker' ：写作'description'的工人的匿名mturk ID。这些来自与'aspect_validation_workers'中使用的同一ID家族。
'description' ：示例文本。
'review_majority' ：关于五个工人中至少三个编辑方面的评论级标签，如果有的话，则no majority 。
'review_label_distribution' ：MTURK验证任务的评论级别评级分布。
'review_workers' ：带注释者的评论级评级的个人响应。键是匿名MTURK ID的列表，这些键在整个数据集中始终使用。
'*_aspect_majority' ：五个工人中至少有三名的编辑方面的方面级标签，如果有的话，则no majority 。
'*_aspect_label_distribution' ：MTURK验证任务的方面级评级分布。
'*_aspect_label_workers' ：从注释者获得评论级评级的个人响应。键是匿名MTURK ID的列表，这些键在整个数据集中始终使用。
'opentable_metadata' ：元数据进行评论。

这是一个例子，

{
    "id" : "000000_000000" ,
    "original_id" : "000000" ,
    "edit_id" : "000000" ,
    "is_original" : true ,
    "edit_goal" : null ,
    "edit_type" : null ,
    "edit_worker" : null ,
    "description" : "Overbooked and didnot honor reservation time,put on wait list with walk INS" ,
    "review_majority" : "1" ,
    "review_label_distribution" : {
        "1" : 4 ,
        "2" : 1
    },
    "review_workers" : {
        "w244" : "1" ,
        "w120" : "2" ,
        "w197" : "1" ,
        "w7" : "1" ,
        "w132" : "1"
    },
    "food_aspect_majority" : "" ,
    "ambiance_aspect_majority" : "" ,
    "service_aspect_majority" : "Negative" ,
    "noise_aspect_majority" : "unknown" ,
    "food_aspect_label_distribution" : "" ,
    "ambiance_aspect_label_distribution" : "" ,
    "service_aspect_label_distribution" : {
        "Negative" : 5
    },
    "noise_aspect_label_distribution" : {
        "unknown" : 4 ,
        "Negative" : 1
    },
    "food_aspect_validation_workers" : "" ,
    "ambiance_aspect_validation_workers" : "" ,
    "service_aspect_validation_workers" : {
        "w148" : "Negative" ,
        "w120" : "Negative" ,
        "w83" : "Negative" ,
        "w35" : "Negative" ,
        "w70" : "Negative"
    },
    "noise_aspect_validation_workers" : {
        "w27" : "unknown" ,
        "w23" : "unknown" ,
        "w81" : "Negative" ,
        "w103" : "unknown" ,
        "w9" : "unknown"
    },
    "opentable_metadata" : {
        "restaurant_id" : 6513 ,
        "restaurant_name" : "Molino's Ristorante" ,
        "cuisine" : "italian" ,
        "price_tier" : "low" ,
        "dining_style" : "Casual Elegant" ,
        "dress_code" : "Smart Casual" ,
        "parking" : "Private Lot" ,
        "region" : "south" ,
        "rating_ambiance" : 1 ,
        "rating_food" : 3 ,
        "rating_noise" : 2 ,
        "rating_service" : 2 ,
        "rating_overall" : 2
    }
}

代码

我们在代码foler中托管分析代码。

排行榜

本节包含在Cebab上获得的一些最佳分数作为五级情感分类任务的排行榜。要添加分数，请考虑拉动请求。

模型架构	公制	大约	S-Gearner	inlp
伯特	L2	0.81（±0.01）	0.74（±0.02）	0.80（±0.02）
伯特	cos	0.61（±0.01）	0.63（±0.01）	0.59（±0.03）
伯特	NORMDIFF	0.44（±0.01）	0.54（±0.02）	0.73（±0.02）
罗伯塔	L2	0.83（±0.01）	0.78（±0.01）	0.84（±0.01）
罗伯塔	cos	0.60（±0.01）	0.64（±0.01）	0.58（±0.01）
罗伯塔	NORMDIFF	0.45（±0.00）	0.59（±0.01）	0.81（±0.01）
GPT-2	L2	0.72（±0.02）	0.60（±0.02）	0.72（±0.01）
GPT-2	cos	0.59（±0.01）	0.59（±0.01）	1.00（±0.00）
GPT-2	NORMDIFF	0.41（±0.01）	0.40（±0.01）	0.58（±0.03）
LSTM	L2	0.86（±0.01）	0.73（±0.01）	0.79（±0.01）
LSTM	cos	0.64（±0.01）	0.64（±0.01）	0.74（±0.02）
LSTM	NORMDIFF	0.50（±0.01）	0.53（±0.01）	0.60（±0.01）