ByteDance, Kuaishou Video AI head-to-head confrontation: there are differences in understanding, capturing and imagining

Author：Eve Cole Update Time：2025-02-09 23:48:02

ByteDance and Kuaishou, the two short video giants are facing a head-to-head confrontation in the field of AI.

On November 8, Dream AI, an AI content platform owned by ByteDance, announced that Seaweed, a video generation model developed by ByteDance, is officially open to platform users. According to ByteDance, the beanbag video generation model Seaweed that is open for use this time is the standard version of this model. It only takes 60 seconds to generate a high-quality AI video of 5 seconds, which is 3 to 5 minutes ahead of all domestic industry standards. Requires generation time.

"Daily Economic News" reporters conducted actual tests on the first and latest versions of Jimeng and Keling and found that after iteration, the video generation effects of the two products have been improved in many aspects and to varying degrees. Keling is in the space The layout and picture details are more accurate, and the adjustment of the generated content effect is more flexible and convenient; Jimeng has advantages in the generation time and video style.

Visual China

A large model technician told reporters that it is difficult for video generation models to achieve different "styles" of production content. "In addition to technology, it also mainly depends on the richness of data sources."

Complete multiple iterations in a short period of time

With the opening of ByteDance's self-developed video generation model Seaweed, the most interesting pair in the domestic video generation model competition - Ji Meng and Ke Ling finally officially competed.

They both carry the "AI dream-making plan" of understanding the physical world and amplifying imagination as much as possible while deriving "reality". But for themselves, Ji Meng and Ke Ling also shoulder the responsibilities of ByteDance and Kuaishou. development prospects.

In fact, both Jimeng and Keling completed several iterations in less than a year. Jimeng started internal testing of the video generation function at the end of March. Half a year later, ByteDance released two video generation models of the Doubao model family, Seaweed and Pixeldance, and invited small-scale testing through Jimeng AI and Volcano Engine. Now Seaweed is open to platform users Officially open.

Pan Helin, a member of the Information and Communication Economy Expert Committee of the Ministry of Industry and Information Technology, told the reporter of "Daily Economic News" that the generation speed of the new model used by Jimeng has been improved, giving users a better generation experience. "Jimeng AI is currently in the domestic generation field. , is still relatively leading.”

Keling became a blockbuster after its "birth" in June. Since its release, it has experienced more than ten updates, including the release of the Tusheng video function and the launch of the 1.5 model. As of now, Keling has more than 3.6 million users, has generated a total of 37 million videos, and will officially launch an independent App (application software) in the near future.

The "Daily Economic News" reporter selected 5 sora video prompt words officially announced by OpenAI (lady on the streets of Tokyo, astronaut, coast from drone perspective, 3D animated little monster, young man reading in the cloud) and tested them separately. The first and latest versions of Menghe Keling, vertically compare the video effects of the two video generation models.

After comparing the video effects produced by Jimeng’s original version and the latest version, the reporter found that there are two parts of Jimeng’s updates that are more obvious: One is that in the performance of dynamic “people and things”, the capture and coherence of movements have been significantly improved. ;The other is that the differentiated presentation of picture styles has also made great progress.

Taking "Lady on the Streets of Tokyo" as an example, the movements of the characters created by the first-generation Yume were stiff, especially in the capture of leg and foot movements, and the overall effect was blurred and distorted. The iterated new version of Ji Meng has natural and smooth character movements, and the detailed processing of foot dynamics is clearer and more in line with the logic of the real world.

There is an obvious difference between a dream and a spirit

After iteration of the two models, the generated effects are more stable, the image quality is better, and the smoothness and detail processing are more able to withstand scrutiny. However, they still have obvious differences in semantic understanding, keyword capture and amplification, and the balance between creative imagination and creative relevance.

Horizontal comparison, comparing the latest version of Jimeng and the 1.5 model Keling, to compare the presentation of 5 Sora video prompt words. The understanding of semantics and the capture of keywords make the video presentation of Jimeng and Keling different.

In the "Coast from a Drone Perspective" video, Ji Meng relatively blurred the "island with a lighthouse" in the prompt word, and whether it was Ke Ling or Sora, the focus of this scene was " Island". In the description of "Coast Highway", the dream setting does not conform to the logic of the real world.

In the video effect of "Astronaut", Ji Meng did not describe the "adventure" in the description. After regeneration, the astronaut holding a coffee and riding a motorcycle also ignored the "adventure" setting. Ke Ling emphasizes "adventure" through the characters' expressions and camera movements. However, both Ji Meng and Ke Ling relatively ignored the "movie trailer" setting. In contrast, Sora's "Spaceman" video has a more cinematic feel.

In the "3D animated little monster" video generation, the setting of Ji Meng's little monster is almost the same as the character "Sally" in the animated movie "Monsters, Inc." The description of the little monster in the prompt words, that is, the presentation of the dream, is also relatively inaccurate, such as the implementation of the "short-haired" setting. In addition, in terms of the presentation of the artistic style, the prompt words emphasize "lighting and texture", that is, the execution of dreams is weaker than that of Ke Ling.

In the video "Lady on the Streets of Tokyo", Ji Meng's performance in the presentation of complex multi-subject interactions is poor compared to Ke Ling's. Both the "lady" who is the subject of the picture and the description of the space are relatively accurate, but the pedestrians in the picture are generally blurred, and the pedestrians in the close-up are distorted.

However, Jimeng AI officially revealed that the Pro versions of Seaweed and Pixeldance video generation models will be available for use in the near future. The Pro version model will optimize multi-subject interaction and the coherence of multi-shot actions, while also overcoming problems such as the consistency of multi-shot switching.

In terms of function and experience, after several rounds of iterations, Keling has adjustments to the "creative imagination and creative relevance" parameters when generating videos, so balance adjustments can be made. Ke Ling can also set content that you do not want to present, such as blur, collage, transformation, animation, etc. The generation operation is more flexible and the effect can be adjusted.

After testing, the dream video generation time is shorter. The video generation time of Sora's 5 prompt words does not exceed half a minute each. However, it takes more than 10 minutes to generate a 10-second high-quality video with the 1.5 model.

It should be noted that the above-mentioned videos generated by Jimeng and Keling were tested and generated by reporters. Different versions and description details will cause differences in the video generation effects.

A battle in the field of AI video generation

For the two short video giants ByteDance and Kuaishou, their opponents in the field of AI video generation are far more than just each other.

For example, on November 8, Zhipu, one of the “Six Little Dragons of AI”, upgraded its video generation tool Qingying. The upgraded Qingying supports video generation from images of any proportion, and has multi-channel generation capabilities. The same command or picture can generate 4 videos at one time. In addition, Qingying can generate sound effects that match the picture. This sound effect function will be launched in public beta this month.

Earlier, on August 31, MiniMax released its first AI high-definition video generation model technology abab-video-1, which received frequent reports in the first month of its launch. According to MiniMax’s official public account, in the first month after the video model was launched on Conch AI, the number of visits to Conch AI’s web version increased by more than 800%. Users cover more than 180 countries and regions around the world, and the product ranked first in the AI product list (web) in September. It ranks first in the global growth rate list and the domestic growth rate list.

Wang Peng, an associate researcher at the Institute of Management of the Beijing Academy of Social Sciences, pointed out to the reporter of "Daily Economic News" that AI video products at home and abroad are currently in a stage of rapid development, and foreign technology giants such as Meta and Google are actively deploying in the field of AI video; domestically , Kuaishou Keling, Jimeng AI and other products are also constantly being iteratively upgraded to improve user experience and commercialization capabilities.

In terms of commercialization possibilities, a research report released by Soochow Securities in August this year mentioned that under the neutral assumption of an AI penetration rate of 15%, the potential space for China’s AI video generation industry is 317.8 billion yuan; Under the model, the production costs of movies, feature-length dramas, cartoons and short plays will be reduced by more than 95% compared to the traditional model.

The huge potential market size and the "superpower" of reducing costs and increasing efficiency can also be glimpsed from Keling's usage data.

At the "2024 China Computer Conference" held in October, Zhang Di, vice president of Kuaishou and head of the large model team, revealed that since its release in June this year, Kuaishou Keling AI has more than 3.6 million users and has generated a total of 37 million videos. and over 100 million images.

Pan Helin said in an interview with a reporter from "Daily Economic News" that Keling is backed by Kuaishou and has traffic support, so the commercialization process is very fast. "AI video products still need to be backed by the Internet platform. Only with traffic can they have commercial potential." ".

Similarly, ByteDance has also put the commercialization of video models at the forefront of its task list. When two video generation models were launched in September this year, Tan Dai, president of Volcano Engine, publicly stated that the new beanbag video generation model "has been considering commercialization since its launch." The areas of use include e-commerce marketing, animation education, and urban cultural tourism. and micro-scripts.

"AI video will show different commercialization potentials on the B-side and C-side." Wang Peng believes that for the B-side, AI video can provide enterprises with more efficient and low-cost video production and distribution solutions; on the C-side, AI video can meet users' needs for personalized, high-quality video content, and can also be combined with e-commerce, advertising and other industries to achieve more precise marketing and monetization.