Image generation models have made significant progress in the field of AI in recent years, but the speed at which high-quality images have been a problem. Luma AI's latest open source Inductive Moment Matching (IMM) technology provides a breakthrough solution to this problem. By optimizing the efficiency of the inference stage, IMM has greatly improved the speed of image generation, which can be called "turbocharged" in the field of AI.
At present, the AI community generally faces the bottleneck problem of generative pre-training. Although the amount of data continues to grow, algorithm innovation is relatively lagging behind. Luma AI pointed out that the core of the problem is not insufficient data, but the failure of existing algorithms to fully tap the potential of data. It's like owning gold mines but only using original tools to mine them, which is inefficient. To break this "algorithm ceiling", Luma AI turned its attention to inference-time computing expansion and proposed IMM technology.
What is unique about IMM is that it redesigns the pre-training algorithm from the perspective of inference efficiency. The traditional diffusion model needs to be gradually adjusted, and the process of generating images is like exploring in a maze. IMM, on the other hand, introduced the concept of "target time step", allowing the model to "jump" more flexibly in the inference process, greatly reducing the steps required for generation. This design not only improves speed, but also enhances the expressive ability of each iteration.
In addition, IMM also adopts maximum mean discrepancy technology, providing accurate navigation for the inference process and ensuring that the model can efficiently generate high-quality images. This innovation has enabled IMMs to surpass traditional methods in both speed and quality.
Experimental results show that IMM achieved a FID score of 1.99 with only 30 times less sampling steps on the ImageNet256x256 dataset, surpassing the diffusion model and Flow Matching. On the CIFAR-10 dataset, the IMM obtained a FID score of 1.98 in just 2 steps, setting the best level for this dataset. This "lightning" speed makes IMM stand out in the field of image generation.
In addition to the speed advantage, IMM also performed well in training stability. Compared with Consistency Models and other models that require special hyperparameter design, IMM can be stably trained under various hyperparameters and model architectures, further reducing the threshold for use.
Luma AI emphasizes that the success of IMM is not only dependent on the application of moment matching technology, but also on its design idea that puts reasoning first. This innovative perspective allows them to break through the limitations of the existing pre-training paradigm and open up new directions for the development of multimodal basic models. Luma AI believes that IMM is just the beginning and will unleash more creative intelligence potential in the future.
GitHub repository: https://github.com/lumalabs/imm