CEBaB
1.0.0
✅评估因果解释方法的英语基准。
✅基于人类验证的基于方面的情感分析(ABSA)基准。
Eldar David Abraham,Karel D'Oosterlink,Amir Feder,Yair Gat,Atticus Geiger,Christopher Potts,Roi Reichart,Zhengxuan Wu。 2022。Cebab:估计现实世界概念对NLP模型行为的因果影响。斯坦福大学技术学院 - 以色列理工学院和根特大学。
@unpublished{abraham-etal-2022-cebab,
title={{CEBaB}: Estimating the Causal Effects of Real-World Concepts on {NLP} Model Behavior},
author={Abraham, Eldar David and D'Oosterlinck, Karel and Feder, Amir and Gat, Yair Ori and Geiger, Atticus and Potts, Christopher and Reichart, Roi and Wu, Zhengxuan},
note={arXiv:2205.14140},
url={https://arxiv.org/abs/2205.14140},
year={2022}}
数据集文件可以从cebab-v1.1.zip下载。我们的v1.1与v1.0不同,仅在v1.1中,v1.1具有适当的唯一ID,我们的示例并纠正了导致以前版本中某些非唯一ID的错误。其他关键领域没有变化。
请注意,我们建议您使用HuggingFace数据集库使用我们的数据集。有关1线性数据加载,请参见下文。
该数据集由Train_exclusive/train_clusive/dev/test拆分组成:
train_exclusive.jsontrain_inclusive.jsontrain_observational.jsondev.jsontest.json 我们数据集的数据表:
Cebab主要使用HuggingFace数据集库维护:
"""
Make sure you install the Datasets library using:
pip install datasets
"""
from datasets import load_dataset
CEBaB = load_dataset ( "CEBaB/CEBaB" )此功能可用于加载原始*.json文件的任何子集:
import json
def load_split ( splitname ):
with open ( splitname ) as f :
data = json . load ( f )
return data {
'id' : str in format dddddd_dddddd as the concatenation of original_id and edit_id ,
'original_id' : str in format dddddd ,
'edit_id' : str in format dddddd ,
'is_original' : bool ,
'edit_goal' : str ( one of "Negative" , "Positive" , "unknown" ) or None if is_original ,
'edit_type' : str ( one of "noise" , "service" , "ambiance" , "food" ),
'edit_worker' : str or None if is_original ,
'description' : str ,
'review_majority' : str ( one of "1" , "2" , "3" , "4" , "5" , "no majority" ),
'review_label_distribution' : dict ( str to int ),
'review_workers' : dict ( str to str ),
'food_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'ambiance_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'service_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'noise_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'food_aspect_label_distribution' : dict ( str to int ),
'ambiance_aspect_label_distribution' : dict ( str to int ),
'service_aspect_label_distribution' : dict ( str to int ),
'noise_aspect_label_distribution' : dict ( str to int ),
'food_aspect_validation_workers' : dict ( str to str ),
'ambiance_aspect_validation_workers' : dict ( str to str ),
'service_aspect_validation_workers' : dict ( str to str ),
'noise_aspect_validation_workers' : dict ( str to str ),
'opentable_metadata' : {
"restaurant_id" : int ,
"restaurant_name" : str ,
"cuisine" : str ,
"price_tier" : str ,
"dining_style" : str ,
"dress_code" : str ,
"parking" : str ,
"region" : str ,
"rating_ambiance" : int ,
"rating_food" : int ,
"rating_noise" : int ,
"rating_service" : int ,
"rating_overall" : int
}
}细节:
'id' :唯一标识符此示例(下面列出的两个ID的组合)。'original_id' :编辑示例的原始句子的唯一标识符。'edit_id' :编辑句子的唯一标识符。'is_original' :指示该句子是否是编辑。'edit_goal' :编辑方面的目标标签,如果它是编辑的示例,则None 。'edit_type' :如果是编辑的示例,则可以修改或用情感标记的方面,否则None 。'edit_worker' :写作'description'的工人的匿名mturk ID。这些来自与'aspect_validation_workers'中使用的同一ID家族。'description' :示例文本。'review_majority' :关于五个工人中至少三个编辑方面的评论级标签,如果有的话,则no majority 。'review_label_distribution' :MTURK验证任务的评论级别评级分布。'review_workers' :带注释者的评论级评级的个人响应。键是匿名MTURK ID的列表,这些键在整个数据集中始终使用。'*_aspect_majority' :五个工人中至少有三名的编辑方面的方面级标签,如果有的话,则no majority 。'*_aspect_label_distribution' :MTURK验证任务的方面级评级分布。'*_aspect_label_workers' :从注释者获得评论级评级的个人响应。键是匿名MTURK ID的列表,这些键在整个数据集中始终使用。'opentable_metadata' :元数据进行评论。这是一个例子,
{
"id" : "000000_000000" ,
"original_id" : "000000" ,
"edit_id" : "000000" ,
"is_original" : true ,
"edit_goal" : null ,
"edit_type" : null ,
"edit_worker" : null ,
"description" : "Overbooked and didnot honor reservation time,put on wait list with walk INS" ,
"review_majority" : "1" ,
"review_label_distribution" : {
"1" : 4 ,
"2" : 1
},
"review_workers" : {
"w244" : "1" ,
"w120" : "2" ,
"w197" : "1" ,
"w7" : "1" ,
"w132" : "1"
},
"food_aspect_majority" : "" ,
"ambiance_aspect_majority" : "" ,
"service_aspect_majority" : "Negative" ,
"noise_aspect_majority" : "unknown" ,
"food_aspect_label_distribution" : "" ,
"ambiance_aspect_label_distribution" : "" ,
"service_aspect_label_distribution" : {
"Negative" : 5
},
"noise_aspect_label_distribution" : {
"unknown" : 4 ,
"Negative" : 1
},
"food_aspect_validation_workers" : "" ,
"ambiance_aspect_validation_workers" : "" ,
"service_aspect_validation_workers" : {
"w148" : "Negative" ,
"w120" : "Negative" ,
"w83" : "Negative" ,
"w35" : "Negative" ,
"w70" : "Negative"
},
"noise_aspect_validation_workers" : {
"w27" : "unknown" ,
"w23" : "unknown" ,
"w81" : "Negative" ,
"w103" : "unknown" ,
"w9" : "unknown"
},
"opentable_metadata" : {
"restaurant_id" : 6513 ,
"restaurant_name" : "Molino's Ristorante" ,
"cuisine" : "italian" ,
"price_tier" : "low" ,
"dining_style" : "Casual Elegant" ,
"dress_code" : "Smart Casual" ,
"parking" : "Private Lot" ,
"region" : "south" ,
"rating_ambiance" : 1 ,
"rating_food" : 3 ,
"rating_noise" : 2 ,
"rating_service" : 2 ,
"rating_overall" : 2
}
}我们在代码foler中托管分析代码。
本节包含在Cebab上获得的一些最佳分数作为五级情感分类任务的排行榜。要添加分数,请考虑拉动请求。
| 模型架构 | 公制 | 大约 | S-Gearner | inlp |
|---|---|---|---|---|
| 伯特 | L2 | 0.81(±0.01) | 0.74(±0.02) | 0.80(±0.02) |
| 伯特 | cos | 0.61(±0.01) | 0.63(±0.01) | 0.59(±0.03) |
| 伯特 | NORMDIFF | 0.44(±0.01) | 0.54(±0.02) | 0.73(±0.02) |
| 罗伯塔 | L2 | 0.83(±0.01) | 0.78(±0.01) | 0.84(±0.01) |
| 罗伯塔 | cos | 0.60(±0.01) | 0.64(±0.01) | 0.58(±0.01) |
| 罗伯塔 | NORMDIFF | 0.45(±0.00) | 0.59(±0.01) | 0.81(±0.01) |
| GPT-2 | L2 | 0.72(±0.02) | 0.60(±0.02) | 0.72(±0.01) |
| GPT-2 | cos | 0.59(±0.01) | 0.59(±0.01) | 1.00(±0.00) |
| GPT-2 | NORMDIFF | 0.41(±0.01) | 0.40(±0.01) | 0.58(±0.03) |
| LSTM | L2 | 0.86(±0.01) | 0.73(±0.01) | 0.79(±0.01) |
| LSTM | cos | 0.64(±0.01) | 0.64(±0.01) | 0.74(±0.02) |
| LSTM | NORMDIFF | 0.50(±0.01) | 0.53(±0.01) | 0.60(±0.01) |
Cebab拥有创意共享归因4.0国际许可证。