Meta Reality Labs' research team recently announced the launch of an innovative generative model called "Pippo", which can generate intensive turnover videos at up to 1K resolution from a single normal photo. This technological breakthrough not only shows the latest advances in the field of computer vision, but also brings new possibilities to image generation technology.
The core innovation of the Pippo model lies in the design of its multi-view diffusion converter. Unlike traditional generative models, Pippo does not need to rely on additional input data such as fitting parameter models or camera parameters. Users only need to provide one photo, and the system can automatically generate multi-view video effects, thus presenting a more vivid and three-dimensional character image.
For the convenience of developers, Pippo is released as a code-only version this time, without pre-training weights. The research team provided complete models, configuration files, inference codes, and sample training codes for the Ava-256 dataset. Developers can quickly start model training and application development through simple command cloning and setting up code bases.
Future plans for the Pippo project include further collation and optimization of code and launching inference scripts for pre-trained models. These improvements will significantly improve the user experience and promote the widespread popularity of this technology in practical applications.
Project link: https://github.com/facebookresearch/pippo
Key points:
The Pippo model is able to generate high-resolution multi-view videos from a single normal photo without additional input.
The code is only published and does not include pre-training weights. Developers can train the model and apply it themselves.
The team plans to launch more features and improvements in the future to enhance the user experience.