webarena下载 - webarena源代码下载

webarena

其他源码

v0.2.0

下载

Webarena：一个现实的网络环境，用于构建自主代理

网站•纸•排行榜

更新于12/5/2024

重要的

该存储库托管Webarena的规范实现，以重现本文报告的结果。 AgentLab通过browsergym引入了几个关键特征，引入了几个关键特征，该网络导航基础架构得到了显着增强，（1）支持使用浏览器的平行实验，（2）在统一的框架中集成流行的Web导航基准（例如，VisualWebarena），在统一的框架中，（3）统一的排行榜报告，以及（4）（4）（4）（4）（4）改善环境的处理案例。我们强烈建议将此框架用于您的实验。

消息

[12/21/2023]我们释放了人类注释者对〜170个任务执行的轨迹的记录。查看资源页面以获取更多详细信息。
[11/3/2023]多个功能！
- 上传最新的执行轨迹
- 添加了预装所有网站的Amazon Machine Image，因此您不必这样做！
- Zeno X Webarena，它使您可以在Webarena上分析代理商而不会痛苦。查看此笔记本以将您自己的数据上传到ZENO，并在此页面上浏览我们现有的结果！
[10/24/2023]我们重新检查了整个数据集并修复了发现的注释错误。当前版本（v0.2.0）相对稳定，我们预计将来注释的重大更新。新的结果具有更好的提示，并且可以在我们的论文中找到与人类绩效的比较
[8/4/2023]添加了指令和Docker资源来托管您自己的Webarena环境。查看此页面以获取详细信息。
[7/29/2023]添加了一个充分评论的脚本，以浏览环境设置。

安装

 # Python 3.10+
conda create -n webarena python=3.10 ; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

# optional, dev only
pip install -e " .[dev] "
mypy --install-types --non-interactive browser_env agents evaluation_harness
pip install pre-commit
pre-commit install

快速演练

查看此脚本，以便快速演练如何设置浏览器环境并使用我们托管的演示站点进行交互。该脚本仅出于教育目的，要执行可重复的实验，请查看下一节。简而言之，使用Webarena与使用OpenAi体育馆非常相似。以下代码段显示了如何与环境互动。

 from browser_env import ScriptBrowserEnv , create_id_based_action
# init the environment
env = ScriptBrowserEnv (
    headless = False ,
    observation_type = "accessibility_tree" ,
    current_viewport_only = True ,
    viewport_size = { "width" : 1280 , "height" : 720 },
)
# prepare the environment for a configuration defined in a json file
config_file = "config_files/0.json"
obs , info = env . reset ( options = { "config_file" : config_file })
# get the text observation (e.g., html, accessibility tree) through obs["text"]

# create a random action
id = random . randint ( 0 , 1000 )
action = create_id_based_action ( f"click [id]" )

# take the action
obs , _ , terminated , _ , info = env . step ( action )

端到端评估

重要的

为了确保正确的评估，请在第1步和步骤2之后设置您自己的Webarena网站。演示网站仅用于浏览目的，以帮助您更好地了解内容。评估812个示例后，按照说明将环境重置为初始状态。

设置独立环境。请查看此页面以获取详细信息。
为每个网站配置URL。

 export SHOPPING= " <your_shopping_site_domain>:7770 "
export SHOPPING_ADMIN= " <your_e_commerce_cms_domain>:7780/admin "
export REDDIT= " <your_reddit_domain>:9999 "
export GITLAB= " <your_gitlab_domain>:8023 "
export MAP= " <your_map_domain>:3000 "
export WIKIPEDIA= " <your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing "
export HOMEPAGE= " <your_homepage_domain>:4399 " # this is a placeholder

鼓励您更新GitHub工作流程中的环境变量，以确保单位测试的正确性

为每个测试示例生成配置文件

python scripts/generate_test_data.py

您将看到config_files文件夹中生成的*.json文件。每个文件包含一个测试示例的配置。

获取所有网站的自动蛋白饼干

 mkdir -p ./.auth
python browser_env/auto_login.py

导出OPENAI_API_KEY=your_key ，有效的OpenAI API键以sk-开头
启动评估

python run.py 
  --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json  # this is the reasoning agent prompt we used in the paper
  --test_start_idx 0 
  --test_end_idx 1 
  --model gpt-3.5-turbo 
  --result_dir < your_result_dir >

该脚本将使用GPT-3.5推理代理运行第一个示例。该轨迹将保存在<your_result_dir>/0.html中

开发基于及时的代理商

定义提示。我们提供两个基线代理，其相应的提示在此处列出。每个提示是一个带有以下键的字典：

 prompt = {
  "intro" : < The overall guideline which includes the task description , available action , hint and others > ,
  "examples" : [
    (
      example_1_observation ,
      example_1_response
    ),
    (
      example_2_observation ,
      example_2_response
    ),
    ...
  ],
  "template" : < How to organize different information such as observation , previous action , instruction , url > ,
  "meta_data" : {
    "observation" : < Which observation space the agent uses > ,
    "action_type" : < Which action space the agent uses > ,
    "keywords" : < The keywords used in the template , the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content > ,
    "prompt_constructor" : < Which prompt construtor is in used , the prompt constructor will construct the input feed to an LLM and extract the action from the generation , more details below > ,
    "action_splitter" : < Inside which splitter can we extract the action , used by the prompt constructor >
    }
  }

实现提示构造函数。一个示例提示的构造函数使用了经过思考/React样式推理的构造函数。提示构造函数是具有以下方法的类：

construct ：将输入提要构造到LLM
_extract_action ：从LLM给定生成，如何提取与动作相对应的短语

引用

如果您使用我们的环境或数据，请引用我们的论文：

 @article{zhou2023webarena,
  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
  journal={arXiv preprint arXiv:2307.13854},
  year={2023}
}

展开

附加信息