On the third day of its "Open Source Week", Chinese artificial intelligence company DeepSeek announced an open source library called DeepGEMM, which supports FP8 universal matrix multiplication (GEMM). Designed for intensive and hybrid expert (MoE) matrix operations, this tool is designed to provide strong support for training and inference for DeepSeek V3 and R1 models. After the official news was released through the X platform, it quickly aroused widespread attention and heated discussions from the technology community.

According to a post published by DeepSeek's official account, DeepGEMM can achieve FP8 computing performance up to 1350+ TFLOPS on NVIDIA Hopper GPU. Although its core logic contains only about 300 lines of code, the library performs even beyond expertly tuned kernels on most matrix sizes, showing extremely high efficiency and simplicity. DeepGEMM does not require complex dependencies, and adopts Just-In-Time technology, supports intensive layout and two MoE layouts. It is designed to be "clean like a tutorial" and is easy for developers to learn and use.
X user @TechBitDaily commented: "The launch of DeepGEMM is a highlight of DeepSeek's open source week, with impressive FP8 performance and simplicity design." Another user @AIObserverCN pointed out that the library has significant advantages in supporting efficient training of MoE models and may promote further innovation in the AI community in the Hopper architecture.
As part of the Open Source Week, the launch of DeepGEMM continues DeepSeek's commitment to promote transparency in AI technology and community collaboration. Previously, the company had released FlashMLA and DeepEP tools two days before the Open Source Week, focusing on fast language model architecture and expert parallel communications, respectively. The debut of DeepGEMM further demonstrates DeepSeek's technical strength in AI infrastructure construction. Industry insiders believe that this library will not only improve the performance of DeepSeek's own model, but also provide global developers with an efficient and easy-to-use matrix computing tool, with broad future application prospects. Users can now obtain DeepGEMM through GitHub to explore its potential in AI training and reasoning.
Project address: https://github.com/deepseek-ai/DeepGEMM