DeepSeek officially opened an official account on the Zhihu platform and released a technical article "Overview of DeepSeek-V3/R1 Inference System". This article discloses the optimization details and cost-profit margin information of its model inference system for the first time, marking the successful end of the highly-watched "DeepSeek Open Source Week". This move not only demonstrates DeepSeek's deep accumulation in the technology field, but also provides a valuable reference for the industry.

The article introduces in-depth two core optimization goals of the DeepSeek-V3/R1 inference system: "larger throughput and lower latency". To achieve these goals, DeepSeek adopts large-scale cross-node expert parallelism (EP) technology, although this technology increases the complexity of the system. The article focuses on how to use EP technology to increase batch size, hide transmission time, and achieve load balancing, thereby significantly improving the overall performance of the system.
Of particular note, DeepSeek rarely discloses its cost and profit margin data. The article disclosed: "Assuming the GPU rental cost is US$2 per hour, the total cost is $87,072 per day. If all tokens are calculated according to the pricing of DeepSeek R1, the theoretical total revenue per day is $562,027, and the cost profit margin is 545%. "The disclosure of this data not only demonstrates DeepSeek's outstanding ability in cost control, but also provides valuable reference and reference for the industry.