CEBaB
1.0.0
✅評估因果解釋方法的英語基準。
✅基於人類驗證的基於方面的情感分析(ABSA)基準。
Eldar David Abraham,Karel D'Oosterlink,Amir Feder,Yair Gat,Atticus Geiger,Christopher Potts,Roi Reichart,Zhengxuan Wu。 2022。 Cebab:估計現實世界概念對NLP模型行為的因果影響。斯坦福大學技術學院 - 以色列理工學院和根特大學。
@unpublished{abraham-etal-2022-cebab,
title={{CEBaB}: Estimating the Causal Effects of Real-World Concepts on {NLP} Model Behavior},
author={Abraham, Eldar David and D'Oosterlinck, Karel and Feder, Amir and Gat, Yair Ori and Geiger, Atticus and Potts, Christopher and Reichart, Roi and Wu, Zhengxuan},
note={arXiv:2205.14140},
url={https://arxiv.org/abs/2205.14140},
year={2022}}
數據集文件可以從cebab-v1.1.zip下載。我們的v1.1與v1.0不同,僅在v1.1中,v1.1具有適當的唯一ID,我們的示例並糾正了導致以前版本中某些非唯一ID的錯誤。其他關鍵領域沒有變化。
請注意,我們建議您使用HuggingFace數據集庫使用我們的數據集。有關1線性數據加載,請參見下文。
該數據集由Train_exclusive/train_clusive/dev/test拆分組成:
train_exclusive.jsontrain_inclusive.jsontrain_observational.jsondev.jsontest.json 我們數據集的數據表:
Cebab主要使用HuggingFace數據集庫維護:
"""
Make sure you install the Datasets library using:
pip install datasets
"""
from datasets import load_dataset
CEBaB = load_dataset ( "CEBaB/CEBaB" )此功能可用於加載原始*.json文件的任何子集:
import json
def load_split ( splitname ):
with open ( splitname ) as f :
data = json . load ( f )
return data {
'id' : str in format dddddd_dddddd as the concatenation of original_id and edit_id ,
'original_id' : str in format dddddd ,
'edit_id' : str in format dddddd ,
'is_original' : bool ,
'edit_goal' : str ( one of "Negative" , "Positive" , "unknown" ) or None if is_original ,
'edit_type' : str ( one of "noise" , "service" , "ambiance" , "food" ),
'edit_worker' : str or None if is_original ,
'description' : str ,
'review_majority' : str ( one of "1" , "2" , "3" , "4" , "5" , "no majority" ),
'review_label_distribution' : dict ( str to int ),
'review_workers' : dict ( str to str ),
'food_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'ambiance_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'service_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'noise_aspect_majority' : str ( one of "Negative" , "Positive" , "unknown" , "no majority" ),
'food_aspect_label_distribution' : dict ( str to int ),
'ambiance_aspect_label_distribution' : dict ( str to int ),
'service_aspect_label_distribution' : dict ( str to int ),
'noise_aspect_label_distribution' : dict ( str to int ),
'food_aspect_validation_workers' : dict ( str to str ),
'ambiance_aspect_validation_workers' : dict ( str to str ),
'service_aspect_validation_workers' : dict ( str to str ),
'noise_aspect_validation_workers' : dict ( str to str ),
'opentable_metadata' : {
"restaurant_id" : int ,
"restaurant_name" : str ,
"cuisine" : str ,
"price_tier" : str ,
"dining_style" : str ,
"dress_code" : str ,
"parking" : str ,
"region" : str ,
"rating_ambiance" : int ,
"rating_food" : int ,
"rating_noise" : int ,
"rating_service" : int ,
"rating_overall" : int
}
}細節:
'id' :唯一標識符此示例(下面列出的兩個ID的組合)。'original_id' :編輯示例的原始句子的唯一標識符。'edit_id' :編輯句子的唯一標識符。'is_original' :指示該句子是否是編輯。'edit_goal' :編輯方面的目標標籤,如果它是編輯的示例,則None 。'edit_type' :如果是編輯的示例,則可以修改或用情感標記的方面,否則None 。'edit_worker' :寫作'description'的工人的匿名mturk ID。這些來自與'aspect_validation_workers'中使用的同一ID家族。'description' :示例文本。'review_majority' :關於五個工人中至少三個編輯方面的評論級標籤,如果有的話,則no majority 。'review_label_distribution' :MTURK驗證任務的評論級別評級分佈。'review_workers' :帶註釋者的評論級評級的個人響應。鍵是匿名MTURK ID的列表,這些鍵在整個數據集中始終使用。'*_aspect_majority' :五個工人中至少有三名的編輯方面的方面級標籤,如果有的話,則no majority 。'*_aspect_label_distribution' :MTURK驗證任務的方面級評級分佈。'*_aspect_label_workers' :從註釋者獲得評論級評級的個人響應。鍵是匿名MTURK ID的列表,這些鍵在整個數據集中始終使用。'opentable_metadata' :元數據進行評論。這是一個例子,
{
"id" : "000000_000000" ,
"original_id" : "000000" ,
"edit_id" : "000000" ,
"is_original" : true ,
"edit_goal" : null ,
"edit_type" : null ,
"edit_worker" : null ,
"description" : "Overbooked and didnot honor reservation time,put on wait list with walk INS" ,
"review_majority" : "1" ,
"review_label_distribution" : {
"1" : 4 ,
"2" : 1
},
"review_workers" : {
"w244" : "1" ,
"w120" : "2" ,
"w197" : "1" ,
"w7" : "1" ,
"w132" : "1"
},
"food_aspect_majority" : "" ,
"ambiance_aspect_majority" : "" ,
"service_aspect_majority" : "Negative" ,
"noise_aspect_majority" : "unknown" ,
"food_aspect_label_distribution" : "" ,
"ambiance_aspect_label_distribution" : "" ,
"service_aspect_label_distribution" : {
"Negative" : 5
},
"noise_aspect_label_distribution" : {
"unknown" : 4 ,
"Negative" : 1
},
"food_aspect_validation_workers" : "" ,
"ambiance_aspect_validation_workers" : "" ,
"service_aspect_validation_workers" : {
"w148" : "Negative" ,
"w120" : "Negative" ,
"w83" : "Negative" ,
"w35" : "Negative" ,
"w70" : "Negative"
},
"noise_aspect_validation_workers" : {
"w27" : "unknown" ,
"w23" : "unknown" ,
"w81" : "Negative" ,
"w103" : "unknown" ,
"w9" : "unknown"
},
"opentable_metadata" : {
"restaurant_id" : 6513 ,
"restaurant_name" : "Molino's Ristorante" ,
"cuisine" : "italian" ,
"price_tier" : "low" ,
"dining_style" : "Casual Elegant" ,
"dress_code" : "Smart Casual" ,
"parking" : "Private Lot" ,
"region" : "south" ,
"rating_ambiance" : 1 ,
"rating_food" : 3 ,
"rating_noise" : 2 ,
"rating_service" : 2 ,
"rating_overall" : 2
}
}我們在代碼foler中託管分析代碼。
本節包含在Cebab上獲得的一些最佳分數作為五級情感分類任務的排行榜。要添加分數,請考慮拉動請求。
| 模型架構 | 公制 | 大約 | S-Gearner | inlp |
|---|---|---|---|---|
| 伯特 | L2 | 0.81(±0.01) | 0.74(±0.02) | 0.80(±0.02) |
| 伯特 | cos | 0.61(±0.01) | 0.63(±0.01) | 0.59(±0.03) |
| 伯特 | NORMDIFF | 0.44(±0.01) | 0.54(±0.02) | 0.73(±0.02) |
| 羅伯塔 | L2 | 0.83(±0.01) | 0.78(±0.01) | 0.84(±0.01) |
| 羅伯塔 | cos | 0.60(±0.01) | 0.64(±0.01) | 0.58(±0.01) |
| 羅伯塔 | NORMDIFF | 0.45(±0.00) | 0.59(±0.01) | 0.81(±0.01) |
| GPT-2 | L2 | 0.72(±0.02) | 0.60(±0.02) | 0.72(±0.01) |
| GPT-2 | cos | 0.59(±0.01) | 0.59(±0.01) | 1.00(±0.00) |
| GPT-2 | NORMDIFF | 0.41(±0.01) | 0.40(±0.01) | 0.58(±0.03) |
| LSTM | L2 | 0.86(±0.01) | 0.73(±0.01) | 0.79(±0.01) |
| LSTM | cos | 0.64(±0.01) | 0.64(±0.01) | 0.74(±0.02) |
| LSTM | NORMDIFF | 0.50(±0.01) | 0.53(±0.01) | 0.60(±0.01) |
Cebab擁有創意共享歸因4.0國際許可證。