360 Intelligent Brain Team successfully reproduces the Deepseek reinforcement learning effect and releases the open source model Light-R1-14B-DS - AI Article

Author：Eve Cole Update Time：2025-05-19 15:00:04

Recently, the 360 Intelligent Brain team has made a major breakthrough in the field of artificial intelligence, successfully recreating the reinforcement learning effect of Deepseek, and officially released the open source inference model Light-R1-14B-DS. The release of this model marks a significant advancement in the field of reinforcement learning, especially in mathematical reasoning ability. Light-R1-14B-DS not only surpasses DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B in performance, but also becomes the industry's first model to achieve reinforcement learning effects on the scale of 14B parameters, demonstrating its outstanding capabilities in complex tasks.

In specific tests, Light-R1-14B-DS showed a significant improvement. Compared with DeepSeek-R1-14B, this model performs particularly well in math competition tasks. In the AIME24 test, its score increased by 4.3 points, while in the AIME25 it increased by 10 points. In addition, on the mathematical reasoning task GPQA, Light-R1-14B-DS achieved an excellent score of 61.7, which even surpassed most 32B-level models, fully demonstrating its powerful abilities in the field of mathematical reasoning.

To achieve this breakthrough, the 360 Intelligent Brain team adopted two innovative training methods. The first is Curriculum SFT (Advanced Supervised Fine Tuning), which allows the model to gradually transition from simple mathematical problems to complex problems through phased training, thereby further enhancing its logical reasoning ability. The second is reinforcement learning (RL), which is the first time that it has been successfully applied on a 14B-level inference model, not only significantly improving the accuracy of reasoning, but also ensuring the basic losslessness of other skills. The combination of these two methods has enabled Light-R1-14B-DS to achieve a qualitative leap in performance.

This release not only includes the model itself, but also open source SFT data, code and technical reports, providing valuable resources for the industry. This achievement not only marks a major progress in the field of reinforcement learning of small and medium-sized models, but also lays a solid foundation for the further popularization and development of AI reasoning capabilities. Through open source of these resources, the 360 Intelligent Brain team hopes to promote more researchers and developers to participate in research in this field and jointly promote the advancement of artificial intelligence technology.

Project address: https://github.com/Qihoo360/Light-R1

Model address: https://huggingface.co/qihoo360/Light-R1-14B-DS

Data address: https://huggingface.co/datasets/qihoo360/Light-R1-SFTData