Shanghai Jieyue Xingchen Intelligent Technology Co., Ltd. recently announced the open source of its latest development of the graphic video model - Step-Video-TI2V. This model is based on Step-Video-T2V training with 30B parameters. It can generate 102 frames, 5 seconds, and 540P resolution videos. It has two core features: controllable motion amplitude and controllable lens motion, especially in terms of animation effects. Compared with the existing open source video model, Step-Video-TI2V not only provides a higher upper limit in parameter scale, but its controllable motion amplitude can also balance the dynamics and stability of video generation results, providing creators with more flexible choices.

During the development of Step-Video-TI2V, the team carried out two key optimizations. First, image conditions are introduced to improve the consistency between the generated video and the original image. Unlike the traditional cross-attention method, the model adopts a more direct way, directly splicing the channel dimensions by directly splicing the vector representation corresponding to the first frame of DiT, so as to ensure that the generated video is highly consistent with the input image. Secondly, the AdaLN module introduces video dynamic scoring information, so that users can specify different motion levels when generating videos, accurately control the dynamic amplitude of the video, thereby balancing dynamics, stability and consistency. In addition, the team also made special and precise markers of subject movements and lens movements, further improving the model's performance in subject dynamics and mirror movement effects.
The core features of Step-Video-TI2V include controllable motion amplitude, multiple control of mirrors, excellent animation effects and support for multi-size generation. Users can freely switch dynamic and stable pictures according to their creative needs, and generate videos from basic push-pull, shaking, lifting and lowering to complex movie-level mirror effects. This model is particularly outstanding in animation tasks and is very suitable for application scenarios such as animation creation and short video production. At the same time, it supports multiple sizes of picture-generated videos, whether it is horizontal, vertical or square screen, it can meet the needs of different platforms.
Experience address:
https://yuewen.cn/videos
GitHub:
https://github.com/stepfun-ai/Step-Video-TI2V
Github-ComfyUI:
https://github.com/stepfun-ai/ComfyUI-StepVideo