Today, the Doubao big model team officially released the Wensheng Picture technical report, which disclosed the technical details of the Seedream 2.0 image generation model for the first time. This report covers the entire process of data construction, pre-training framework, and post-training RLHF, marking a major breakthrough in the field of literary and biographical graphics. The launch of Seedream 2.0 undoubtedly dropped a "blockbuster" in the industry, attracting widespread attention.
Since its launch on the Doubao APP and Zhimeng platform in early December 2024, Seedream2.0 has served hundreds of millions of C-end users and has been highly praised by professional designers. Compared with mainstream models such as Ideogram 2.0 and Midjourney V6.1, Seedream 2.0 has achieved significant improvements in many aspects. It not only solves the problem of poor text rendering, but also strengthens the understanding of Chinese culture, which comprehensively improves the bilingual understanding, aesthetics and instructions in Chinese and English.
Through the Bench-240 evaluation benchmark test, Seedream2.0 is particularly prominent in the structural rationality of the content generated by English prompt words and the accuracy of text understanding. In terms of Chinese generation and rendering of text, its availability rate reached 78%, and its perfect response rate was as high as 63%, far exceeding other models in the industry and demonstrating its powerful capabilities in multilingual processing.
In terms of technical implementation, the Doubao Big Model Team has made many innovations. In the data preprocessing process, the team built a framework with "knowledge integration" as the core, and balanced data quality and knowledge diversity through a four-dimensional data architecture. The intelligent annotation engine has achieved three-level cognitive evolution, significantly improving the understanding and recognition capabilities of the model, while engineering reconstruction has greatly improved the efficiency of data processing.
During the pre-training stage, the team focused specifically on bilingual comprehension and text rendering. Through the native bilingual alignment scheme, the team fine-tuned the LLM and built a dedicated dataset, successfully breaking the dimensional wall between language and vision. The dual-modal coding fusion system enables the model to take into account text semantics and font glyphs, while the triple-upgraded DiT architecture introduces QK-Norm and Scaling ROPE technologies, which improves the stability of training and realizes the generation of multi-resolution images.
During the post-training RLHF process, the team developed an optimization system, starting from three aspects: multi-dimensional preference data system, three different reward models, and repeated learning to drive model evolution, effectively improving the performance of the model. The performance score values of different reward models have steadily increased in the iteration, further demonstrating Seedream2.0's leading position in the field of image generation.
The release of this technical report not only demonstrates the Doubao big model team's determination to promote the development of image generation technology, but also provides the industry with valuable technical experience. In the future, the team will continue to explore innovative technologies, improve model performance boundaries, conduct in-depth research on reinforcement learning optimization mechanisms, and help the vigorous development of image generation technology.
If you are interested in the technical details of Seedream2.0, you can visit the technical display page: [https://team.doubao.com/tech/seedream](https://team.doubao.com/tech/seedream) or download the full technical report: [https://arxiv.org/pdf/2503.07703](https://arxiv.org/pdf/2503.07703).