webarena下載 - webarena源代碼下載

webarena

其他源碼

v0.2.0

下載

Webarena：一個現實的網絡環境，用於構建自主代理

網站•紙•排行榜

更新於12/5/2024

重要的

該存儲庫託管Webarena的規範實現，以重現本文報告的結果。 AgentLab通過browsergym引入了幾個關鍵特徵，引入了幾個關鍵特徵，該網絡導航基礎架構得到了顯著增強，（1）支持使用瀏覽器的平行實驗，（2）在統一的框架中集成流行的Web導航基準（例如，VisualWebarena），在統一的框架中，（3）統一的排行榜報告，以及（4）（4）（4）（4）（4）改善環境的處理案例。我們強烈建議將此框架用於您的實驗。

消息

[12/21/2023]我們釋放了人類註釋者對〜170個任務執行的軌蹟的記錄。查看資源頁面以獲取更多詳細信息。
[11/3/2023]多個功能！
- 上傳最新的執行軌跡
- 添加了預裝所有網站的Amazon Machine Image，因此您不必這樣做！
- Zeno X Webarena，它使您可以在Webarena上分析代理商而不會痛苦。查看此筆記本以將您自己的數據上傳到ZENO，並在此頁面上瀏覽我們現有的結果！
[10/24/2023]我們重新檢查了整個數據集並修復了發現的註釋錯誤。當前版本（v0.2.0）相對穩定，我們預計將來註釋的重大更新。新的結果具有更好的提示，並且可以在我們的論文中找到與人類績效的比較
[8/4/2023]添加了指令和Docker資源來託管您自己的Webarena環境。查看此頁面以獲取詳細信息。
[7/29/2023]添加了一個充分評論的腳本，以瀏覽環境設置。

安裝

 # Python 3.10+
conda create -n webarena python=3.10 ; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

# optional, dev only
pip install -e " .[dev] "
mypy --install-types --non-interactive browser_env agents evaluation_harness
pip install pre-commit
pre-commit install

快速演練

查看此腳本，以便快速演練如何設置瀏覽器環境並使用我們託管的演示站點進行交互。該腳本僅出於教育目的，要執行可重複的實驗，請查看下一節。簡而言之，使用Webarena與使用OpenAi體育館非常相似。以下代碼段顯示瞭如何與環境互動。

 from browser_env import ScriptBrowserEnv , create_id_based_action
# init the environment
env = ScriptBrowserEnv (
    headless = False ,
    observation_type = "accessibility_tree" ,
    current_viewport_only = True ,
    viewport_size = { "width" : 1280 , "height" : 720 },
)
# prepare the environment for a configuration defined in a json file
config_file = "config_files/0.json"
obs , info = env . reset ( options = { "config_file" : config_file })
# get the text observation (e.g., html, accessibility tree) through obs["text"]

# create a random action
id = random . randint ( 0 , 1000 )
action = create_id_based_action ( f"click [id]" )

# take the action
obs , _ , terminated , _ , info = env . step ( action )

端到端評估

重要的

為了確保正確的評估，請在第1步和步驟2之後設置您自己的Webarena網站。演示網站僅用於瀏覽目的，以幫助您更好地了解內容。評估812個示例後，按照說明將環境重置為初始狀態。

設置獨立環境。請查看此頁面以獲取詳細信息。
為每個網站配置URL。

 export SHOPPING= " <your_shopping_site_domain>:7770 "
export SHOPPING_ADMIN= " <your_e_commerce_cms_domain>:7780/admin "
export REDDIT= " <your_reddit_domain>:9999 "
export GITLAB= " <your_gitlab_domain>:8023 "
export MAP= " <your_map_domain>:3000 "
export WIKIPEDIA= " <your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing "
export HOMEPAGE= " <your_homepage_domain>:4399 " # this is a placeholder

鼓勵您更新GitHub工作流程中的環境變量，以確保單位測試的正確性

為每個測試示例生成配置文件

python scripts/generate_test_data.py

您將看到config_files文件夾中生成的*.json文件。每個文件包含一個測試示例的配置。

獲取所有網站的自動蛋白餅乾

 mkdir -p ./.auth
python browser_env/auto_login.py

導出OPENAI_API_KEY=your_key ，有效的OpenAI API鍵以sk-開頭
啟動評估

python run.py 
  --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json  # this is the reasoning agent prompt we used in the paper
  --test_start_idx 0 
  --test_end_idx 1 
  --model gpt-3.5-turbo 
  --result_dir < your_result_dir >

該腳本將使用GPT-3.5推理代理運行第一個示例。該軌跡將保存在<your_result_dir>/0.html中

開發基於及時的代理商

定義提示。我們提供兩個基線代理，其相應的提示在此處列出。每個提示是一個帶有以下鍵的字典：

 prompt = {
  "intro" : < The overall guideline which includes the task description , available action , hint and others > ,
  "examples" : [
    (
      example_1_observation ,
      example_1_response
    ),
    (
      example_2_observation ,
      example_2_response
    ),
    ...
  ],
  "template" : < How to organize different information such as observation , previous action , instruction , url > ,
  "meta_data" : {
    "observation" : < Which observation space the agent uses > ,
    "action_type" : < Which action space the agent uses > ,
    "keywords" : < The keywords used in the template , the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content > ,
    "prompt_constructor" : < Which prompt construtor is in used , the prompt constructor will construct the input feed to an LLM and extract the action from the generation , more details below > ,
    "action_splitter" : < Inside which splitter can we extract the action , used by the prompt constructor >
    }
  }

實現提示構造函數。一個示例提示的構造函數使用了經過思考/React樣式推理的構造函數。提示構造函數是具有以下方法的類：

construct ：將輸入提要構造到LLM
_extract_action ：從LLM給定生成，如何提取與動作相對應的短語

引用

如果您使用我們的環境或數據，請引用我們的論文：

 @article{zhou2023webarena,
  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
  journal={arXiv preprint arXiv:2307.13854},
  year={2023}
}

展開

附加信息

版本 v0.2.0
類型其他源碼
更新時間 2025-04-15
大小 5.91MB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部