ByteDance’s Doubao Mockup Team recently announced a major technological breakthrough, successfully overcoming the key bottleneck of the Hybrid Expert Model (MoE) architecture, and open source an optimization technology called COMET. This technology not only significantly improves the training efficiency of large models, but also greatly reduces training costs, bringing new development opportunities to the field of large models.
The core of COMET technology lies in its efficient optimization capabilities, which can increase the training efficiency of large models to 1.7 times, while reducing training costs by 40%. This breakthrough result has been practically applied in ByteDance's Wanka cluster training, saving millions of GPU hours of training computing power in total, demonstrating its powerful performance in actual scenarios.
Compared with MoE optimization solutions such as DualPipe, which is open sourced by DeepSeek, COMET technology has stronger compatibility and convenience. It can directly connect to the existing MoE training framework like a plug-in, supporting mainstream big models in the industry without invasive modifications to the training framework. This seamless integration feature makes COMET more flexible and efficient in technical applications.
Technical data shows that after the introduction of COMET, a single MoE layer can achieve 1.96 times acceleration, and an end-to-end average efficiency increase of 1.71 times. In addition, COMET has shown stable performance in different parallel strategies, input scales and hardware environments, demonstrating its wide applicability. What is more noteworthy is that COMET can also be used in conjunction with DeepSeek's DualPipe solution, which is expected to further greatly compress the model training cost.
The open source of this technology undoubtedly brings new breakthroughs to the field of big models and is expected to accelerate the research and development and application of big models. By reducing training costs and improving efficiency, COMET technology will provide support to more enterprises and research institutions to promote the further development of artificial intelligence technology.
Paper address: https://arxiv.org/pdf/2502.19811
Open source address: https://github.com/bytedance/flux?continueFlag=c1d74dd2912ab3909a1a27fe4f5cf519