Shengshu Technology’s video generation model Vidu version 1.5 is released to overcome the problem of “multi-subject consistency”

Author：Eve Cole Update Time：2025-03-06 16:00:04

More than a hundred days after Vidu went online, Shengshu Technology grandly launched Vidu version 1.5. This version has made world-leading breakthroughs in understanding diverse inputs and solving "consistency" problems. The editor of Downcodes will give you an in-depth understanding of the innovations brought by Vidu 1.5 and how it promotes the visual model to move into the "context" era and accelerate the arrival of general artificial intelligence (AGI).

On the occasion that Vidu has been online for more than 100 days, Shengshu Technology is proud to announce the release of the new version of Vidu 1.5, which has achieved world-leading breakthroughs, especially in understanding diverse inputs and breaking through the "consistency" problem.

The launch of Vidu1.5 marks the visual model entering a new "context" era, accelerating the arrival of general artificial intelligence (AGI). Vidu has the ability to generate consistent characters since its global launch, and solves key pain points in video generation by locking characters' facial features. In September, Vidu was the first in the world to release the "Subject Consistency" function, extending facial consistency to full-body consistency, and extending the scope to any subject such as animals, objects, and virtual characters. Vidu's technological breakthroughs are mainly reflected in three aspects: precise control of complex subjects, natural consistency of facial features and dynamic expressions of characters, and multi-subject consistency.

Vidu1.5 demonstrates the new "intelligent emergence" of the visual model and its powerful contextual learning capabilities. This means that the visual model not only has the ability to understand and imagine, but also can perform memory management during the generation process. Vidu1.5 continues its industry-leading generation efficiency and can generate a video in less than 30 seconds. Vidu adheres to the concept of versatility and a design philosophy consistent with LLM (Large Language Model), unifying all problems into visual input and visual output problems, using a single Transformer to uniformly model variable-length input and output, and from video data Get intelligence in compression.

The launch of Vidu1.5 not only improves the controllability of video models, but also achieves consistent generation of multiple angles, multiple subjects, and multiple elements through flexible multiple inputs. This marks the emergence of visual intelligence and accelerates the arrival of AGI. Vidu is no longer just a high-quality, efficient video generator. It can also incorporate contextual information and memory into the generation process. This is a "big leap" in visual modal intelligence. The visual model will have stronger cognitive capabilities and become an important piece of the AGI puzzle.

Experience address: www.vidu.studio

The release of Vidu version 1.5 heralds a new chapter in visual AI technology. Its powerful functions and convenient operation will definitely bring users a new video generation experience. We look forward to Vidu continuing to make breakthroughs in future development and contributing more to the arrival of AGI!