Meta has partnered with King Abdullah University of Technology (KAUST) in Saudi Arabia to launch a new series of video diffusion models called MarDini. This model can efficiently complete a variety of video generation tasks, including video interpolation, image-to-video conversion and video expansion, greatly simplifying the high-quality video creation process. MarDini uses a combination of planning models and generative models to generate high-quality videos with fewer steps through mask autoregression (MAR) methods and diffusion processes, showing significant advantages in performance and efficiency, providing video creators with Powerful tools and set new industry benchmarks.

Based on last year, Meta further made efforts in the field of generating AI videos. Previously, it launched text-to-video and editing models such as Emu Video and Emu Edit. This year, the advanced video editor Movie Gen was also launched. This shows that Meta is committed to providing video creators with more powerful tools.
The power of MarDini is that it can generate videos based on any number of masked frames, and supports a variety of generation tasks, such as video interpolation, image-to-video conversion, and video expansion.
Image to video resultsAmong them, MarDini's main application is image-to-video generation. This feature is demonstrated by using a reference frame placed in the middle as a conditional input and generating 16 additional frames. In the official generated video example, 17 frames rendered at 8FPS can be generated for smooth 2-second video.
Video extension resultsMarDini also enables you to expand your video by adjusting existing videos for any length of time. We add 12 new frames to each sequence by generating a 2-second extension from a 5-frame reference video.
Video interpolation resultsMarDini implements video interpolation by generating intermediate frames using the first and last frames as adjustment signals. When these boundary frames are the same, MarDini can create seamless looping videos.
How MarDini works is very interesting. It adopts advanced and efficient video generation technology, mainly composed of two parts: planning model and generative model. First, the planning model uses mask autoregression (MAR) method to interpret low-resolution input frames, generating guidance signals for the frames that need to be created. The lightweight generative model then generates high-resolution detailed frames through the diffusion process, ensuring that the final video is smooth and visually good.
Unlike many video models that require complex pre-trained image models, MarDini claims to be trained from scratch using unlabeled video data. This is because it adopts a progressive training strategy, which allows the model to better cope with different frame configurations by flexibly adjusting the masking method of frames during training.
A distinctive feature of MarDini is its flexibility and performance. It is not only powerful but also efficient, suitable for larger tasks. This model can handle a variety of tasks such as video interpolation, image-to-video generation, and video expansion, whether it is smoothing existing video clips or creating a complete sequence from scratch.
In terms of performance, MarDini sets new benchmarks to generate high-quality video with fewer steps, which makes it more cost- and time-wise than more complex alternatives. "Our research shows that our modeling strategy demonstrates competitiveness in a variety of interpolation and animation benchmarks, while reducing computational demand at comparable parameter scales," the official research paper noted.
Project entrance: https://mardini-vidgen.github.io/
Key points:
MarDini is a new generation video generation model launched by Meta and KAUST, which can easily complete a variety of video creation tasks.
This model achieves efficient video interpolation and image-to-video generation through the combination of planning and generation models.
MarDini generates high-quality videos with fewer steps, significantly improving the flexibility and efficiency of creation.
In short, the emergence of MarDini marks a significant advance in video generation technology, with its efficient performance and flexible application scenarios bringing new possibilities to the field of video creation. In the future, MarDini may play a greater role in film production, animation production, and other areas that require video generation.