Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
This repository hosts the code, data, and model weight of RLHF-V, a novel framework that aligns Multimodal Large Language Models (MLLMs) behavior through fine-grained correctional human feedback.
We collect fine-grained correctional feedback data, which can better credit the desired behavior, by asking human annotators to correct the hallucinated segments in model responses. Benefiting from the high data efficiency, it takes only 1 hour on 8 A100 GPUs for us to reduce the hallucination rate of the base model by 34.8%. Specifically, we conduct experiments on Muffin, an MLLM that has a strong ability in image understanding and reasoning which is trained on UniMM-Chat.
Visit our ? project page and ? paper to explore more! And don't miss to try our interactive demo!
We present the RLHF-V-Dataset, which is a human preference dataset constructed by fine-grained segment-level human corrections. In practice, we obtain a total of 1.4k annotated data that includes a diverse set of detailed description instructions and question-answering instructions.
We release RLHF-V model weights on Hugging Face.
We also provide our SFT weights, which is the model checkpoint after finetuning Muffin on the VQAv2 dataset.
cd RLHF-V
git clone https://github.com/thunlp/muffin
cd Muffin
# Creating conda environment
conda create -n muffin python=3.10
conda activate muffin
# Installing dependencies
pip install -e .
# Install specific version of transformers to make sure you can reproduce the experimental results in our papers
git clone --recursive [email protected]:huggingface/transformers.git
cd transformers
git checkout a92e0ad2e20ef4ce28410b5e05c5d63a5a304e65
pip install .
cd ..Install additional packages if you need to do training.
git clone --recursive https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
# Note: Uncomment the following line if you have CUDA version <= 11.4
# git checkout ad11394
MAX_JOBS=8 python setup.py install
cd ..To run Object HalBench evaluation, you also need the following packages:
jsonlines
nltk==3.8.1
spacy==3.7.0
# Download and install "en_core_web_trf" for spacy
# The wheel version we use can be downloaded from
# https://github.com/explosion/spacy-models/releases/tag/en_core_web_trf-3.7.2
# run pip install en_core_web_trf-3.7.2-py3-none-any.whlRun the following script to generate, evaluate, and summarize results for LLaVA Bench:
# cd RLHF-V
bash ./script/eval/eval_muffin_llavabench.sh ./RLHF-V_weight ./results/RLHF-V {YOUR_OPENAI_API_KEY}The evaluation of Object HalBench relies on the caption and segmentation annotations from the COCO2014 dataset. Please first download the COCO2014 dataset from the COCO dataset's official website.
mkdir coco2014
cd coco2014
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
unzip annotations_trainval2014.zipPlease replace {YOUR_COCO2014_ANNOTATION_DIR} with the path for COCO2014 annotation directory(e.g. ./coco2014/annotations), and replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key.
# cd RLHF-V
bash ./script/eval_muffin_objhal.sh ./RLHF-V_weight ./results/RLHF-V {YOUR_COCO2014_ANNOTATION_DIR} {YOUR_OPENAI_API_KEY}Please download the MMHal evaluation data here, and save the file in eval/data.
# cd RLHF-V
bash ./script/eval_muffin_mmhal.sh ./RLHF-V_weight ./results/RLHF-V {YOUR_OPENAI_API_KEY}Please follow the instructions in the Install section to prepare the training environment. And make sure to upgrade to the latest code base of Muffin:
cd Muffin
git pull
pip install -e .
Please download our SFT model checkpoint and save it to Muffin/RLHF-V_SFT_weight.
Please make sure to upgrade to the latest code base of Muffin. After installing the environment of Muffin, you can train your model as follows. This script will automatically download our open-sourced training data from HuggingFace, generate logps by our SFT model, and do DDPO training:
cd Muffin
ref_model=./RLHF-V_SFT_weight
bash ./script/train/run_RLHFV.sh
./RLHFV_checkpoints/dpo_exp
master
RLHFV
1.1
$ref_model
./RLHF-V-Dataset
RLHFV_SFT
2160
360
0.1
False
True
Usage and License Notices: The data, code, and checkpoint are intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna, and Chat GPT. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
If you find our model/code/data/paper helpful, please consider cite our papers and star us ️!
@article{yu2023rlhf,
title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
journal={arXiv preprint arXiv:2312.00849},
year={2023}
}
@article{yu2024rlaifv,
title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness},
author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
journal={arXiv preprint arXiv:2405.17220},
year={2024},
}