RepBelief
1.0.0
Repositori ini menyediakan kode untuk makalah "Model bahasa mewakili kepercayaan diri dan orang lain". Ini menunjukkan bahwa LLMS secara internal mewakili keyakinan diri mereka sendiri dan agen lain, dan memanipulasi representasi ini dapat secara signifikan memengaruhi teori kemampuan penalaran pikiran mereka.
conda create -n lm python=3.8 anaconda
conda activate lm
# Please install PyTorch (<2.4) according to your CUDA version.
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
Kemudian unduh model bahasa (misalnya Mistral-7B-instruct-V0.2, Deepseek-LLM-7B-CHAT) ke models/ . Anda juga dapat menentukan jalur file di lm_paths.json .
sh scripts/save_reps.sh 0_forward belief
sh scripts/save_reps.sh 0_forward action
sh scripts/save_reps.sh 0_backward beliefBiner:
python probe.py --belief=protagonist --dynamic=0_forward --variable belief
python probe.py --belief=oracle --dynamic=0_forward --variable belief
python probe.py --belief=protagonist --dynamic=0_forward --variable action
python probe.py --belief=oracle --dynamic=0_forward --variable action
python probe.py --belief=protagonist --dynamic=0_backward --variable belief
python probe.py --belief=oracle --dynamic=0_backward --variable beliefMultinomial:
python probe_multinomial.py --dynamic=0_forward --variable belief
python probe_multinomial.py --dynamic=0_forward --variable action
python probe_multinomial.py --dynamic=0_backward --variable beliefsh scripts/0_forward_belief.sh
sh scripts/0_forward_action.sh
sh scripts/0_backward_belief.shIntervensi untuk Tugas Keyakinan Maju :
sh scripts/0_forward_belief_interv_oracle.sh
sh scripts/0_forward_belief_interv_protagonist.sh
sh scripts/0_forward_belief_interv_o0p1.shIntervensi lintas tugas:
sh scripts/cross_0_forward_belief_to_forward_action_interv_o0p1.sh
sh scripts/cross_0_forward_belief_to_backward_belief_interv_o0p1.sh @inproceedings { zhu2024language ,
title = { Language Models Represent Beliefs of Self and Others } ,
author = { Zhu, Wentao and Zhang, Zhining and Wang, Yizhou } ,
booktitle = { Forty-first International Conference on Machine Learning } ,
year = { 2024 }
}