Voice AI is "in one step"! Step open source 130B dominant voice model, real-time dialogue + emotional cloning, explosion is coming - AI Articles

Author：Eve Cole Update Time：2025-05-15 03:25:02

The field of voice interaction technology has recently ushered in a major breakthrough. Step Audio, a leading domestic AI company, announced the open source of a super-large voice model with 130 billion parameters. This innovative achievement has attracted widespread attention in the industry and is hailed as a milestone in the development of voice AI technology. This model is not only the first open source real-time voice dialogue system that integrates speech comprehension and generation control, but also indicates that voice interaction technology will move to a new height with its comprehensive functions and advanced technology.

The core highlight of this open source model is its integrated design and powerful control capabilities. It not only accurately understands the user's voice commands, but also flexibly controls the voice generation process, providing users with an unprecedented personalized interactive experience. This design makes voice interaction more natural and smooth, greatly improving the user experience.

In terms of language support, this model demonstrates excellent multilingual processing capabilities, can smoothly switch between Chinese, English, Japanese and other languages, and easily cope with cross-language communication scenarios. In addition, it deeply supports a variety of dialects, such as Cantonese, Sichuan dialect, etc., making voice interaction more close to daily life and more humane.

In addition to language processing capabilities, this model also has fine voice emotion control functions. Users can set the emotional tone of voice according to their needs, such as happiness, sadness, etc., to make AI expression more infectious. At the same time, speech speed and rhythm style can also be adjusted according to the needs of the scene to meet diverse expression needs. What’s even more surprising is that the model also supports more creative voice forms such as RAP and humming, providing unlimited possibilities for content creation.

In addition, this model also has voice cloning function, and users can create a very personalized voice assistant through this technology, and even realize the "replica" and "inheritance" of sound. This function brings more application scenarios and possibilities to voice interaction technology.

Jieyue’s open source such a powerful voice model will undoubtedly greatly promote technological progress and application innovation in the entire industry. It not only greatly lowers the application threshold of voice AI technology, but also indicates that voice interaction will become smarter, natural and personalized in the future, truly integrating into people's daily lives.

Project address: https://github.com/stepfun-ai/Step-Audio/tree/main