Lenovo Wentian WA7785a G3 server sets record! Running 671B stand-alone DeepSeek big model throughput up to 6708token/s! - AI Article

Author：Eve Cole Update Time：2025-05-20 17:50:02

2025 Binance Direct

Lenovo announced today that its first AMD AI big model training server, Lenovo Wentian WA7785a G3, achieved an extreme throughput of up to 6708token/s when deploying the 671B (full-blooded version) DeepSeek big model on a stand-alone machine, once again breaking the record of the performance of a single server running super-large-scale models.

According to reports, this performance breakthrough is due to the strong support of Lenovo Wanquan heterogeneous intelligent computing platform. Lenovo continuously optimizes the entire process of large models from pre-training, post-training to inference through a series of innovative technical means such as memory access optimization, video memory optimization, innovative PCIe5.0 full-interconnect architecture, and selected optimal operators in the SGLang framework. The actual test results show that on the Lenovo Wentian WA7785a G3 server that deploys the DeepSeek671B big model, the highest throughput reaches an astonishing 6708token/s.

GPU 芯片 (5)

When simulating a problem conversation scenario (context sequence length 128/1K), the server can support up to 158 concurrency numbers, TPOT (Time Per Output Token) is 93 milliseconds, TTFT (Time To First Token) is 2.01 seconds; when simulating a code generation scenario (context sequence length 512/4K), the concurrency numbers can reach 140, TPOT is 100 milliseconds, and TTFT is 5.53 seconds. Lenovo said that this performance means that a single Lenovo Wentian WA7785a G3 server can support the normal use of enterprises with a scale of 1,500 people. It is another major leap in the inference performance of the large-scale model deployed by Lenovo Wentian WA7780G3 server, after the full-blood version of the DeepSeek big model was deployed by the single-machine DeepSeek big model.

Lenovo emphasized that this technological breakthrough is the result of joint design, collaborative optimization and joint implementation of Lenovo China Infrastructure Business Group, Lenovo Research Institute ICI Laboratory and AMD. At the same time, this is not the final result. Lenovo and AMD are still continuing to explore new methods of deep tuning in order to achieve higher performance breakthroughs.

Lenovo Wentian WA7785a G3 server sets record! Running 671B stand-alone DeepSeek big model throughput up to 6708token/s! - AI Article

2025 Binance Direct​

2025 Binance Direct