text2reward下载 - text2reward源代码下载

text2reward

Ai源码

1.0.0

下载

text2reward：用语言模型进行加固学习的奖励成型

纸质文本的代码回报：用语言模型进行增强学习的奖励。请参阅我们的项目页面以获取更多演示和最新的相关资源。

更新

2023-10-09 ：我们发布了代码。
2023-09-20 ：我们发布了Text2Reward的论文和网站。

依赖性

要建立环境，请在外壳中运行此代码：

 # set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0

故障排除

如果您尚未安装mujoco ，请按照此处的说明进行安装。之后，请尝试以下命令确认成功的安装：

$ python3
>>> import mujoco_py

如果您在运行Maniskill2时遇到以下错误，我们将在此处引用您阅读文档。
- RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
- Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
- Segmentation fault (core dumped)

用法

再进一步

为了重新实现我们的实验结果，您可以运行以下脚本：

Maniskill2：

bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh

遇到以下警告是正常的：

[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.

metaworld：

bash run_oracle.sh
bash run_zero_shot.sh

生成新的奖励代码

首先，请在.bashrc （或.zshrc等）中添加以下环境变量。

 export PYTHONPATH= $PYTHONPATH : ~ /path/to/text2reward

然后导航到目录text2reward/code_generation/single_flow并运行以下脚本：

 # generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh

运行新实验

默认情况下，上面的run_oracle.sh脚本使用环境提供的专家写的奖励； run_zero_shot.sh和run_few_shot.sh脚本使用我们实验中使用的生成奖励。如果您想根据所提供的奖励运行新实验，只需按照上面的BASH脚本进行操作，然后将--reward_path参数修改为自己奖励的路径即可。

引用

如果您发现我们的工作有帮助，请引用我们：

 @inproceedings { xietext2reward ,
  title = { Text2Reward: Reward Shaping with Language Models for Reinforcement Learning } ,
  author = { Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao } ,
  booktitle = { The Twelfth International Conference on Learning Representations }
}