Recently, TechCrunch exposed Google's internal documents, showing that Google is using contractors to compare and test its Gemini AI and Anthropic's Claude AI, triggering controversy about compliance. The document shows that the contractor needs to evaluate the quality of the answers of the two AIs under multiple criteria and gives high praise to Claude's safety. This move may violate Anthropic's terms of service, as Google is a major investor in Anthropic, and the terms prohibit the unauthorized use of Claude to develop competing products or train competing AI models.
Google is having contractors evaluate its Gemini AI against Anthropic's Claude, according to internal communications obtained by TechCrunch, a move that raises questions about compliance.
The document shows that the contractor responsible for improving Gemini needs to compare and evaluate the quality of Gemini and Claude's answers based on multiple criteria such as authenticity and thoroughness within 30 minutes. Contractors recently discovered explicit references to Claude on Google's internal review platform, including the words "I am Claude, created by Anthropic."
Internal discussions revealed that contractors noticed Claude's stricter safety practices. A contractor said that "Claude's security settings are the most stringent among all AI models." In some cases, when Gemini's answers were flagged as "serious safety violations" because they involved "nudity and restraint," Claude simply refused to respond to relevant prompts.

It is worth noting that as Anthropic’s major investor, Google’s approach may violate Anthropic’s terms of service. The terms explicitly prohibit unapproved access to Claude to "build competing products" or "train competing AI models." When asked whether it had been authorized by Anthropic, Google DeepMind spokesperson Shira McNamara declined to respond directly.
McNamara said that while DeepMind does "compare model outputs" for evaluation, he denied using the Anthropic model to train Gemini. "This is in line with industry standard practice," she said, "but any claim that we used Anthropic models to train Gemini is inaccurate."
Previously, Google required AI product contractors to evaluate Gemini’s answers outside of its areas of expertise, raising concerns among contractors that AI could produce inaccurate information in sensitive fields such as health care.
As of press time, an Anthropic spokesperson had not commented on the matter.
Google’s comparative test of Gemini and Claude has attracted widespread attention, and its compliance issues and impact on the AI industry still require further observation. Google's response has not completely eliminated concerns, Anthropic's official response has not yet been released, and the incident is still unfolding.