The editor of Downcodes learned that Zhipu AI has open sourced its latest Vincentian graph model CogView3 and its upgraded version CogView-3Plus-3B, making waves in the field of Vincentian graphs. As the first model to use relay diffusion, CogView3 has made breakthroughs in image quality and efficiency with its unique cascade diffusion method. Its generation quality exceeds SDXL, but its inference speed is faster, even in the streamlined version. Comparable performance. This undoubtedly provides new possibilities for high-quality and efficient image generation.
Recently, Zhipu AI has open sourced its latest masterpiece - CogView3 and its upgraded version CogView-3Plus-3B to the public, injecting new vitality into the field of Vincentian graphics.
The debut of CogView3 is undoubtedly an important milestone. As the first model to implement relay diffusion in the field of text-to-image generation, it adopts a unique cascade diffusion method. This innovative approach first generates low-resolution images, and then completes the final output through relay-based super-resolution technology. This not only greatly improves the quality of generated images, but also significantly reduces the cost of training and inference.

The most eye-catching thing is the performance of CogView3. According to human evaluation results, CogView3 surpasses the current state-of-the-art open source text-to-image model SDXL in terms of generation quality, with a winning rate of 77.0%. Even more impressive is that it achieves this feat in only about half the inference time of SDXL. If you use the streamlined version of CogView3, you can still maintain a comparable performance level while taking up only one-tenth of the inference time of SDXL. This breakthrough undoubtedly opens up new possibilities for efficient, high-quality image generation.
At the same time, Zhipu AI also launched CogView-3Plus-3B, an image model based on the DiT (Diffusion Transformers) framework. Although its specific test results have not yet been announced, the industry is full of expectations for its potential. CogView-3Plus-3B is further optimized on the basis of CogView3 and introduces advanced technologies such as Zero-SNR diffusion noise scheduling and joint text-image attention mechanism. These improvements not only reduce training and inference costs, but also maintain strong image generation capabilities.
It is worth mentioning that CogView-3Plus-3B supports a wide range of image resolutions, ranging from 512x512 to 2048x2048, which greatly increases the flexibility of its application scenarios. Whether it's daily use or professional creation, you'll find the right resolution option.
To help users make better use of these models, Zhipu AI also provides practical suggestions and tools. They recommend users to optimize prompt words through large language models (LLM), which can significantly improve the quality of generated images. At the same time, Zhipu AI also provides sample scripts, which greatly reduces the user's threshold for use.
Project address: https://github.com/THUDM/CogView3
The open source of CogView3 and CogView-3Plus-3B marks another big step forward for Wenshengtu technology. The editor of Downcodes looks forward to its bringing more surprises in future applications! I hope developers can actively try and contribute to its development.