Kunlun Wanwei recently officially released the world's first industrial multimodal inference model Skywork R1V, referred to as "R1V". With 3.8 billion parameters, the model is close to the well-known closed-source model DeepSeek-R1, and has performed well in multiple benchmarks, sweeping through a series of current state-of-the-art technologies (SOTAs). Kunlun Wanwei chose open source R1V, aiming to promote technology sharing and progress and inject new vitality into the global AI open source community.

Known for its outstanding multimodal reasoning capabilities, the R1V is able to seamlessly combine text and visual information to demonstrate powerful intelligence. Specifically, R1V directly benchmarks closed-source models such as Claude3.5Sonnet and GPT-4o in visual question-and-answer tasks, and maintains top text reasoning capabilities. In the MMMU benchmark, the R1V set a new record for models of the same size with a high score of 69, while also achieved a 67.5 mark in the MathVista test, demonstrating its powerful ability in complex mathematical reasoning and logical analysis.
The success of R1V is due to the many innovative technologies of Kunlun Wanwei research team. Among them, it includes cross-modal transfer learning, which effectively transfers the text reasoning capabilities of large models to visual modes, greatly reducing the need for multimodal inference data. In addition, the hybrid training strategy adopted by R1V dynamically adjusts the length of thinking chains through the combination of iterative supervision fine-tuning and reinforcement learning, thereby improving inference efficiency. It is worth mentioning that R1V also introduces an adaptive length thinking chain distillation framework to avoid "overthinking" in the reasoning process, which significantly improves the efficiency and quality of reasoning.
With the launch of R1V, Kunlun Wanwei has not only become the world's first open source multimodal inference model company, but has also taken an important step in promoting the realization of the AGI (General Artificial Intelligence) dream. The weights, inference code and technical reports of the model are all published, and anyone can obtain relevant resources through GitHub and Hugging Face.
Model weight download
Hugging Face:
https://huggingface.co/Skywork/Skywork-R1V-38B
GitHub:
https://github.com/SkyworkAI/Skywork-R1V
Detailed technical report
https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V.pdf
Key points:
Skywork R1V, the world's first industrial open source multimodal inference model, was officially released with parameters of up to 3.8 billion.
R1V performed well in multiple benchmarks, especially in MMMU and MathVista with high scores of 69 and 67.5 respectively.
Kunlun Wanwei’s open source initiative aims to promote technology sharing, inject vitality into the global AI open source community, and help AGI’s dream come true.