Zhipu AI launches AutoGLM intelligent agent: input commands to simulate human operation of mobile phones

Author：Eve Cole Update Time：2025-02-27 13:32:01

The editor of Downcodes learned that the Zhipu technical team has recently launched the research result based on GLM technology-AutoGLM agent. This intelligent agent can simulate human operation of mobile phones and perform various daily tasks, such as liking WeChat, shopping on Taobao, booking hotels on Ctrip, etc., bringing AI applications closer to people's daily lives. Its technological breakthrough lies in solving many problems in task planning and action execution of large-model agents, and achieving significant performance improvements, surpassing other competing products on multiple evaluation benchmarks. The emergence of AutoGLM marks a significant progress in the field of "Phone Use" of artificial intelligence, providing new possibilities for future intelligent interaction.

The Zhipu technology team recently launched a new product based on the research results of the GLM technology team - AutoGLM, which is an agent that can simulate human operation of mobile phones and perform various tasks. The launch of AutoGLM marks the progress of artificial intelligence in the field of "Phone Use", making the application of AI closer to people's daily lives.

AutoGLM can perform a variety of tasks, such as liking and commenting on Moments on WeChat, purchasing historical order products on Taobao, booking hotels on Ctrip, purchasing train tickets on 12306, ordering takeout on Meituan, etc. Its application scenarios are not limited to this. In theory, AutoGLM can complete anything that humans can do on visual electronic devices. The operation logic is similar to humans, without the need for complex workflow construction.

Currently, users can experience AutoGLM-Web by installing the "Zhipu Qingyan" plug-in, which is a browser assistant that can simulate users visiting and clicking on web pages, and automatically complete advanced retrieval, summary and content generation on the website. In addition, AutoGLM has also opened internal testing applications on the Android system, and has carried out in-depth cooperation with mobile phone manufacturers such as Honor.

AutoGLM's technology is based on Zhipu's self-developed "Basic Agent Decoupling Intermediate Interface" and "Self-evolving Online Course Reinforcement Learning Framework", which solves the capability antagonism, training tasks and data scarcity in large model agent task planning and action execution. , problems such as sparse feedback signals and policy distribution drift. AutoGLM can continuously improve itself and continuously and steadily improve its own performance, similar to how people continue to acquire new skills in the process of growth.

In terms of technical challenges, AutoGLM solves the problem of insufficient accuracy in "action execution" and insufficient flexibility in "task planning". Through the design of "basic agent decoupling intermediate interface", it decouples the two stages of "task planning" and "action execution" through the natural language intermediate interface, achieving a great improvement in the capabilities of the agent. At the same time, AutoGLM adopts the "self-evolving online course reinforcement learning framework" to learn and improve the capabilities of large model agents in Web and Phone environments in real online environments.

AutoGLM achieved significant performance improvements in both Phone Use and Web Browser Use, and surpassed the performance of GPT-4o and Claude-3.5-Sonnet on the AndroidLab evaluation benchmark. In the WebArena-Lite evaluation benchmark, AutoGLM achieved approximately a 200% performance improvement over GPT-4o, narrowing the gap in success rates between humans and large model agents in GUI control.

Project address: https://xiao9905.github.io/AutoGLM

All in all, the launch of AutoGLM represents an important breakthrough in artificial intelligence technology. It not only improves AI's ability to operate mobile phones, but also provides more possibilities for future intelligent life. The editor of Downcodes hopes that AutoGLM will have wider applications and further optimization in the future.