text2reward
1.0.0
纸质文本的代码回报:用语言模型进行增强学习的奖励。请参阅我们的项目页面以获取更多演示和最新的相关资源。
要建立环境,请在外壳中运行此代码:
# set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0mujoco ,请按照此处的说明进行安装。之后,请尝试以下命令确认成功的安装: $ python3
>>> import mujoco_pyRuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailedSome required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.Segmentation fault (core dumped) 为了重新实现我们的实验结果,您可以运行以下脚本:
Maniskill2:
bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh遇到以下警告是正常的:
[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.metaworld:
bash run_oracle.sh
bash run_zero_shot.sh首先,请在.bashrc (或.zshrc等)中添加以下环境变量。
export PYTHONPATH= $PYTHONPATH : ~ /path/to/text2reward然后导航到目录text2reward/code_generation/single_flow并运行以下脚本:
# generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh默认情况下,上面的run_oracle.sh脚本使用环境提供的专家写的奖励; run_zero_shot.sh和run_few_shot.sh脚本使用我们实验中使用的生成奖励。如果您想根据所提供的奖励运行新实验,只需按照上面的BASH脚本进行操作,然后将--reward_path参数修改为自己奖励的路径即可。
如果您发现我们的工作有帮助,请引用我们:
@inproceedings { xietext2reward ,
title = { Text2Reward: Reward Shaping with Language Models for Reinforcement Learning } ,
author = { Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao } ,
booktitle = { The Twelfth International Conference on Learning Representations }
}