Downcodes editor reports: The InstantX team, together with research teams from Nanjing University of Science and Technology, Beihang University and Peking University, jointly developed a new style transfer model called CSGO. This model aims to break through the bottleneck of image generation technology, especially to achieve significant improvements in the integration of content and style. The CSGO model supports three style transfer modes, covering a variety of application scenarios such as pictures and pictures, pictures and text, and text editing pictures, demonstrating its powerful functionality and flexibility. Let’s take a closer look at this impressive AI model.
Recently, the InstantX team, together with research teams from Nanjing University of Science and Technology, Beihang University and Peking University, jointly developed a new style transfer model called CSGO, aiming to improve image generation technology, especially in the combination of content and style.

CSGO mainly supports 3 modes of style migration, as follows:
1. Content pictures + style reference pictures to synthesize the style pictures of the content. For example, in the following case, if you give the original picture that needs to be changed in style, such as "bear, house", and then give the style reference picture, you can change the style of the original picture into a reference style picture.

2. Style reference pictures + text prompts to synthesize style pictures with text content. For example, in the following case, if a reference style picture is given and a text prompt is given, such as "a cat, a dog, a man, a panda", the corresponding content style picture can be generated.

3) Edit the specified object in the picture through text.

The core of the CSGO model lies in its unique data construction process. The research team carefully designed a data generation and automatic cleaning pipeline to build a large-scale style transfer data set called IMAGStyle. This data set contains 210,000 image triples and has become an important resource for academic research and exploration of image generation technology.
The design concept of this model is very novel. CSGO can clearly distinguish content and style features during the image generation process. The advantage of this model, the researchers say, is its end-to-end training method, which means no fine-tuning is required during the inference phase.
At the same time, another highlight of the CSGO model is that it retains the generation ability of the original text-to-image model without training UNet. Through these innovations, CSGO achieves image-driven style transfer, text-driven style synthesis, and text-editing-driven style synthesis.
In terms of experimental results, CSGO performed very well. The researchers provided a series of quantitative and visual comparison data, conducted a comprehensive comparison with the latest existing methods, and demonstrated the advantages of CSGO in style control capabilities.
Highlight:
The CSGO model successfully generated the IMAGStyle data set containing 210,000 image triples through an innovative data construction pipeline.
The model achieves a clear separation of content and style and supports multiple generation methods, including image-driven and text-driven style transfer.
? Experimental results show that CSGO outperforms existing technologies in terms of style control capabilities, demonstrating a new level of image generation.
The emergence of the CSGO model marks a new breakthrough in image generation technology. Its outstanding performance in style transfer and innovative data construction methods provide new directions and inspirations for future image generation research. The editor of Downcodes hopes that the CSGO model will be applied in more fields and bring us a more exciting visual experience!