Recently, a major breakthrough in the field of artificial intelligence has ushered in. Moonshot announced the open source of its latest optimizer, Muon, an innovative technology that will increase computing efficiency to twice the level of traditional AdamW. The release of this news coincides with DeepSeek's upcoming open source of multiple code libraries, which has aroused widespread attention and heated discussion in the industry.
The Muon optimizer was originally proposed in 2024 by OpenAI researcher Keller Jordan and others, and it performed well in small-scale model training. However, as the model size expanded, the original Muon encountered a bottleneck in performance improvement. To solve this problem, the Dark Side team made in-depth technical improvements, mainly including the addition of weight decay and consistent root mean square (RMS) updates to support the application of Muon in large-scale training without the need for hyperparameter adjustments.
The new Muon optimizer has been applied to the latest Moonlight model, a hybrid expert (MoE) model with 3B/16B parameters. After 5.7 trillion tokens training, the performance of the Moonlight model has significantly improved and has become the current "Pareto frontier". This result means that the Moonlight model surpasses other models in all performance metrics under the same training budget.
The Dark Side of Moon also open sourced the implementation code of Muon and released corresponding pre-training and intermediate checkpoints, providing valuable resources for researchers' subsequent research. Research shows that the Muon optimizer requires only 52% of the FLOPs of AdamW during training, which further verifies its efficiency in large-scale language model training.
The Muon optimizer of the Dark Side of the Moon not only surpasses traditional optimizers in performance, but also injects new vitality into the development of the entire AI field through open source. With more and more researchers and developers participating, this optimizer is expected to drive further advances in artificial intelligence technology.
Paper address: https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf