xAI new model Grok 3 logical reasoning ability has been praised by the founder of OpenAI - AI Articles

Author：Eve Cole Update Time：2025-05-28 04:50:01

Elon Musk's artificial intelligence company xAI released its latest language model Grok3 this Monday, a release marking a significant progress the company has made in the field of artificial intelligence. Musk revealed at the press conference that the new model has ten times the computing power of its predecessor, thanks to the Memphis-based data center, which is equipped with about 200,000 GPUs, providing strong computing power support for the model.

The Grok3 series models have launched a variety of variants, including a streamlined version that sacrifices part of the accuracy while increasing speed. Additionally, the newly launched “inference” model is designed specifically to solve mathematical and scientific problems, and users can adjust these features through the “think” and “brain” settings in the Grok interface. xAI said this version has not been finalized yet, the model is still being trained continuously, and the team plans to make further improvements and optimizations in the coming weeks.

According to AI benchmarking platform lmarena.ai, Grok3 scored more than 1,400 in the chatbot field, becoming a leader in the field. It excels in all categories such as programming, surpassing OpenAI, Anthropic and Google models. However, actual performance may differ from the benchmark results. For example, although Claude3.5Sonnet scores lower than some models in coding benchmarks, many users still consider it a better choice for programming tasks.

OpenAI founder Andrej Karpathy received early access to Grok3 and he highly praised the model's logical reasoning ability. Karpasi said the "think" feature can successfully handle complex tasks such as calculating GPT-2 training flops or creating hexagonal mesh for board games, which were previously limited to the high-end model of OpenAI only. In addition, this feature improves the accuracy of basic mathematical operations, such as letter counting and comparison of decimals.

In terms of new search capabilities, Karpasi noted that DeepSearch's quality is comparable to Perplexity's research tools, providing relevant answers to topics such as upcoming Apple products and Palantir stock dynamics. However, he also found some obvious problems: the model sometimes generates fake URLs, makes unsupported statements, and only quotes X's posts at specific prompts.

In addition, Grok3 also seems to have a lack of awareness of its existence, missing the location of xAI in the main AI labs. These limitations have left DeepSearch not yet at the quality level of OpenAI “deep research” and underperformed on humor and ethical issues. Nevertheless, the launch of Grok3 still demonstrates the strong strength and innovation capabilities of xAI in the field of artificial intelligence.