In the field of image generation, multi-layer image generation technology is leading a revolution that has completely changed the way users interact with generative models. This technology allows users to isolate, select and edit specific image layers, thus providing unprecedented creative freedom. Recently, Microsoft's research team launched an innovative technology called "Anonymous Region Transformer" (ART), which can directly generate variable multi-layer transparent images based on global text cues and anonymous regional layouts.

ART's design is inspired by "schema theory", which allows the generative model to independently decide which visual information aligns with which text information by adopting anonymous regional layout. This approach contrasts sharply with the traditional semantic layout. Traditional semantic layouts usually require clear correspondence, while ART's anonymous area layout provides greater flexibility, making the generation process more intelligent and efficient.
It is worth mentioning that ART introduces a layer-by-layer area-based cropping mechanism, which only selects visual information related to each anonymous area, thereby significantly reducing the cost of attention calculation. This method not only speeds up the generation speed, making it more than 12 times faster than the full attention method, but also effectively reduces conflicts between layers and can handle image generation at more than 50 different levels. This efficient processing capability provides strong support for complex image generation tasks.
In addition, ART also proposed a high-quality multi-layer transparent image autoencoder that supports the transparency of variable multi-layer images directly encoded and decoded in a joint manner. This innovative design provides new possibilities for precise control and scalable layer generation, further driving the development of interactive content creation. Through this technology, users can control all levels of images more flexibly and achieve more refined editing and creation.
Project: https://art-msra.github.io/
Key points:
ART can directly generate multi-layer transparent images based on global text prompts and anonymous regional layout.
The layer-by-layer area cropping mechanism is adopted to significantly improve the image generation efficiency, which is 12 times faster than the traditional method.
The new high-quality autoencoder supports the precise control and generation of multi-layer transparent images, promoting interactive content creation.