Tencent’s Hunyuan Literature Picture Model (Hunyuan DiT) has recently ushered in an important upgrade, launching a 6G video memory version, which allows personal computer users to easily run this advanced AI model. The new version not only perfectly adapts to the Diffusers library with plug-ins such as LoRA and ControlNet, but also adds support for Kohya graphical interface, greatly reducing the threshold for developers to train personalized LoRA models. After the Hunyuan DiT model was upgraded to version 1.2, the texture and composition of the pictures were significantly improved, bringing users a better visual experience.
At the same time, Tencent also opens the Hunyuan literary and biographical map marking model "Hunyuan Captioner", which supports Chinese and English bilingualism and has deeply optimized the cultural and biographical map scenes, which can more accurately understand Chinese semantics and output structure, Complete and accurate picture description. In addition, Hunyuan Captioner can also identify well-known figures and landmarks, and allows developers to supplement personalized background knowledge, further improving the practicality and flexibility of the model.

The open source of the Hunyuan Captioner model provides powerful tools for literary and artistic image researchers and data annotators around the world to help them improve the quality of image descriptions and generate more comprehensive and accurate image descriptions, thereby improving the model effect. The generated data set can not only be used to train models based on Hunyuan DiT, but also to train other visual models, further promoting the development of AI technology in the field of image processing.
The three major updates of the Hunyuan DiT model include the launch of the small video memory version, the access to the Kohya training interface, and the model upgrade to version 1.2, which further lower the threshold for use and improve the quality of the picture. The generated images of the Hunyuan DiT model have better texture, but the previous high requirements for video memory have discouraged many developers. Now, Hunyuan DiT has launched a small video memory version, which requires only 6G of video memory to run. After cooperation with Hugging Face, the small video memory version and related plug-ins have been adapted to the Diffusers library, simplifying the cost of use.
Kohya is an open source lightweight model fine-tuning training service that provides a graphical interface and is widely used for the training of diffusion model-like graphic models. Users can complete the full parameter fine-tuning and LoRA training of the model through Kohya, without writing code, greatly simplifying the developer's workflow.
The Hunyuan Captioner model constructs a structured picture description system and improves the integrity of the description through multiple sources, injecting a lot of background knowledge to make the output description more accurate and complete. These optimizations make Hunyuan DiT one of the most popular domestic DiT open source models, with its Github Star number exceeding 2.6k, fully demonstrating its popularity in the developer community.
Official website
https://dit.hunyuan.tencent.com/
Code
https://github.com/Tencent/HunyuanDiT
Model
https://huggingface.co/Tencent-Hunyuan/HunyuanDiT
paper
https://tencent.github.io/HunyuanDiT/asset/Hunyuan_DiT_Tech_Report_05140553.pdf