text2reward下載 - text2reward源代碼下載

text2reward

Ai源碼

1.0.0

下載

text2reward：用語言模型進行加固學習的獎勵成型

紙質文本的代碼回報：用語言模型進行增強學習的獎勵。請參閱我們的項目頁面以獲取更多演示和最新的相關資源。

更新

2023-10-09 ：我們發布了代碼。
2023-09-20 ：我們發布了Text2Reward的論文和網站。

依賴性

要建立環境，請在外殼中運行此代碼：

 # set up conda
conda create -n text2reward python=3.7
conda activate text2reward
# set up ManiSkill2 environment
cd ManiSkill2
pip install -e .
pip install stable-baselines3==1.8.0 wandb tensorboard
cd ..
cd run_maniskill
bash download_data.sh
# set up MetaWorld environment
cd ..
cd Metaworld
pip install -e .
# set up code generation
pip install langchain chromadb==0.4.0

故障排除

如果您尚未安裝mujoco ，請按照此處的說明進行安裝。之後，請嘗試以下命令確認成功的安裝：

$ python3
>>> import mujoco_py

如果您在運行Maniskill2時遇到以下錯誤，我們將在此處引用您閱讀文檔。
- RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed
- Some required Vulkan extension is not present. You may not use the renderer to render, however, CPU resources will be still available.
- Segmentation fault (core dumped)

用法

再進一步

為了重新實現我們的實驗結果，您可以運行以下腳本：

Maniskill2：

bash run_oracle.sh
bash run_zero_shot.sh
bash run_few_shot.sh

遇到以下警告是正常的：

[svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[svulkan2] [warning] Continue without GLFW.

metaworld：

bash run_oracle.sh
bash run_zero_shot.sh

生成新的獎勵代碼

首先，請在.bashrc （或.zshrc等）中添加以下環境變量。

 export PYTHONPATH= $PYTHONPATH : ~ /path/to/text2reward

然後導航到目錄text2reward/code_generation/single_flow並運行以下腳本：

 # generate reward code for Maniskill
bash run_maniskill_zeroshot.sh
bash run_maniskill_fewshot.sh
# generate reward code for MetaWorld
bash run_metaworld_zeroshot.sh

運行新實驗

默認情況下，上面的run_oracle.sh腳本使用環境提供的專家寫的獎勵； run_zero_shot.sh和run_few_shot.sh腳本使用我們實驗中使用的生成獎勵。如果您想根據所提供的獎勵運行新實驗，只需按照上面的BASH腳本進行操作，然後將--reward_path參數修改為自己獎勵的路徑即可。

引用

如果您發現我們的工作有幫助，請引用我們：

 @inproceedings { xietext2reward ,
  title = { Text2Reward: Reward Shaping with Language Models for Reinforcement Learning } ,
  author = { Xie, Tianbao and Zhao, Siheng and Wu, Chen Henry and Liu, Yitao and Luo, Qian and Zhong, Victor and Yang, Yanchao and Yu, Tao } ,
  booktitle = { The Twelfth International Conference on Learning Representations }
}