AI "werewolf killing" war! GPT-4.5: Social reasoning + top deception, "playing with Claude and DeepSeek"! - AI Articles

Author：Eve Cole Update Time：2025-05-26 11:50:02

In the field of artificial intelligence, AI not only demonstrates outstanding intelligence in chess games, but now it also shows amazing abilities in social games full of strategies and deceptions like "werewolf killing". Recently, an AI "werewolf kill" benchmark test called "Elimination Game" has attracted widespread attention, and the test results are shocking: GPT-4.5 stands out in this social game, leaving competitors such as Claude 3.7 Sonnet and DeepSeek R1 far behind. This result makes people wonder whether AI's social intelligence has reached an incredible level?

The rules for “Elimination Game” are extremely challenging: up to eight players (including AI models and live players) participate, eliminating one person in each round by voting until two survivors are left. More complicatedly, the eliminated players will form a "jury" to decide the final winner. This mechanism makes the game full of betrayal, deception and strategy, and can be called the AI version of "Game of Thrones".

During the game, players can have fierce debates in the public chat room, striving for advantages by expounding opinions, winning people's hearts and confusing opponents. In addition to public communication, players can also have private chats, secret alliances or set traps. In just three rounds of private chats, the amount of information and strategy reached an extremely high level. Players must find a balance between trust and deception, and if they are not careful, they may be eliminated.

When the game enters the final showdown, the remaining two players will give their final farewell speeches, trying to convince the eliminated "jury" to support themselves. Ultimately, the jury will vote on who is the only winner. This link not only tests players' language expression skills, but also tests their persuasion and strategy.

How do major models perform in this fierce AI "wewolf killing" battle? The test results are impressive:

With its excellent social reasoning ability and strategy, GPT-4.5 has become a well-deserved "king". It shows extremely low betrayal rates in the game and tends to gain an advantage through alliances and cooperation. In the final stage, GPT-4.5 showed amazing persuasion, successfully won the support of the jury, and finally stood out with a 62.6% victory rate.

Claude3.7Sonnet shows flexible and changeable strategies. Although it is slightly inferior to GPT-4.5 in social reasoning and deception capabilities, it still performs well. It was easy to go between cooperation and betrayal, and eventually ranked second with a winning rate of 59.3%.

DeepSeek R1 adopted a more radical strategy, although it performed well in some stages, but it was slightly insufficient in social strategies and verbal expression, and eventually ranked third with a winning rate of 53.8%.

This "Elimination Game" benchmark not only demonstrates the huge potential of AI in social intelligence, but also makes us look forward to the future development of AI. With the continuous advancement of AI technology, perhaps in the near future, AI will surpass humans in more fields and become an indispensable part of our lives. This AI "wewolf killing" war is just the beginning of the expansion of AI intelligent boundaries. The surprises and shocks of the future may be far beyond our imagination.