text2reward
1.0.0
紙質文本的代碼回報:用語言模型進行增強學習的獎勵。請參閱我們的項目頁面以獲取更多演示和最新的相關資源。
要建立環境,請在外殼中運行此代碼:
# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0mujoco ,請按照此處的說明進行安裝。之後,請嘗試以下命令確認成功的安裝: $ python3
>>> import mujoco_pyRuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailedSome required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.Segmentation fault (core dumped) 為了重新實現我們的實驗結果,您可以運行以下腳本:
Maniskill2:
bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh遇到以下警告是正常的:
[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.metaworld:
bash run_oracle.sh
bash run_zero_shot.sh首先,請在.bashrc (或.zshrc等)中添加以下環境變量。
export PYTHONPATH= $PYTHONPATH : ~ /path/to/text2reward然後導航到目錄text2reward/code_generation/single_flow並運行以下腳本:
# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh默認情況下,上面的run_oracle.sh腳本使用環境提供的專家寫的獎勵; run_zero_shot.sh和run_few_shot.sh腳本使用我們實驗中使用的生成獎勵。如果您想根據所提供的獎勵運行新實驗,只需按照上面的BASH腳本進行操作,然後將--reward_path參數修改為自己獎勵的路徑即可。
如果您發現我們的工作有幫助,請引用我們:
@inproceedings { xietext2reward ,
title = { Text2Reward: Reward Shaping with Language Models for Reinforcement Learning } ,
author = { Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao } ,
booktitle = { The Twelfth International Conference on Learning Representations }
}