neoplanner
1.0.0
該倉庫包含了一個名為“ Neoplanner”的順序規劃代理的實現。該計劃者適用於具有較大狀態空間和動作空間的基於文本的環境。它通過查詢與基礎LLM的疑問協同結合了兩種狀態空間搜索,以獲得最佳的行動計劃。獎勵信號定量用於驅動搜索。通過最大化狀態值的上限置信度來維持探索和剝削的平衡。在需要隨機探索的地方,LLM被查詢以生成動作計劃。從每個試驗中學習以文本格式存儲為實體關係。這些用於LLM的將來查詢以持續改進。科學世界環境中的實驗揭示了與當前最佳方法相比提高了124%,這是在多個任務中獲得的平均獎勵。以下是架構。

首先,克隆回購併導航到Neoplanner目錄並安裝要求
git clone https://github.com/swarna-kpaul/neoplanner
cd neoplanner
python3 -m pip install -r requirements.txt然後,您需要修改config/keys.py文件以更新OpenAiaPikey 。您可以通過註冊到OpenAI門戶來獲取API密鑰。第一次用戶可以免費獲得5美元的信用額。您可以從此URL獲取API鍵
此後可以進口包裝
from solver import neoplanner初始化求解器對象
# task is the identifier of tasks as specified in
# stmloadfile is the name of the file (with full path) that contains saved state. The state will be loaded initially. default value is None
# stmstoragefile is the name of the file (with full path) whare intermediate states can be saved. default value is None
# beliefstorefile is the name of the file (with full path) whare intermediate learnings can be saved. default value is None
# beliefloadfile is the name of the file (with full path) that contains intermediate learnings. The learnings will be loaded initially. default value is None
# sigma is exploration probability constant. Increasing its value would increase random exploration by the the LLM.
solverobj = neoplanner ( task = "2-1" , stmloadfile = None , stmstoragefile = None , beliefstorefile = None , beliefloadfile = None , sigma = 0.3 )運行求解器。
env = solverobj . train ()
######## get actionplan from statespace graph
additionalinstructions , actionplan , _ , _ , _ = env . getinstructions ()培訓將繼續進行,直到達到目標。您可以中斷培訓過程之間。在這種情況下,請確保您提供stmstoragefile和ChielStoreFile,以保存中間狀態和信念。
您可以加載stmstoragefile並查詢env對像以從狀態空間圖中獲取操作計劃。
import pickle
from solver import scienv
env = scienv ( "2-1" )
stmstoragefile = < file name with full path >
with open ( stmstoragefile , 'rb' ) as f :
rootnodeid , invalidnodeid , DEFAULTVALUE , statespace , totaltrials , actiontrace , environment = pickle . load ( f )
env . model . rootnodeid = rootnodeid
env . model . invalidnodeid = invalidnodeid
env . model . DEFAULTVALUE = DEFAULTVALUE
env . model . statespace = statespace
env . model . totaltrials = totaltrials
env . environment = environment
env . reset ()
additionalinstructions , actionplan , _ , _ , _ = env . getinstructions ()預處理文件夾包含所有訓練有素的狀態空間和學習7個任務的文件。您可以使用上述代碼來查看任務的求解操作計劃,通過使用適當的文件名設置stmstoragefile
通過運行以下代碼可以看到學習
import pickle
beliefloadfile = < belief load file name >
with open ( beliefloadfile , 'rb' ) as f :
beliefaxioms , totalexplore = pickle . load ( f )
print ( beliefaxioms ) @misc{paul2023sequential,
title={Sequential Planning in Large Partially Observable Environments guided by LLMs},
author={Swarna Kamal Paul},
year={2023},
eprint={2312.07368},
archivePrefix={arXiv},
primaryClass={cs.AI}
}