Stability AI, a company known for Stable Diffusion text-generating image models, recently launched a major collaboration with global semiconductor giant Arm. The goal of this collaboration is to introduce AI technology that generates audio to mobile devices, allowing the Stable Audio Open model to run fully on Arm CPUs. This means that users can quickly generate sound effects, audio samples and production elements directly on the device without an Internet connection, greatly improving the convenience and efficiency of creation.

Stability AI says that as generative artificial intelligence becomes more and more widely used among enterprises and professional creators, it becomes especially important to ensure that these models and workflows are easily used in every creative field. This not only improves creative efficiency, but also helps seamlessly integrate these technologies into the visual media production process, thereby driving innovation and development across the industry.
Faced with growing demand, Stability AI is committed to improving the efficiency of its models running on edge devices. In optimizing the Stable Audio Open model to fit mobile devices, initial tests showed that the time to generate audio on an Arm CPU device reached 240 seconds. However, by distilling the model and leveraging Arm's software stack, especially through the int8 matrix multiplication kernel in XNNPack's KleidiAI, the company successfully reduced the time to generate an 11-second audio clip to 8 seconds, achieving a 30-fold increase in response speed.
It should be noted that users need a compatible mobile device to experience this feature. Considering that most smartphones are now equipped with an Arm-based CPU, this technology has become more accessible to all kinds of users. In the future, Stability AI also plans to bring all its models in the fields of images, videos and 3D to the edge devices, aiming to revolutionize the way visual media is created on mobile devices and bring users a richer and more convenient creative experience.
Key points:
Stability AI partnered with Arm to launch technology that generates audio offline on mobile devices.
Through model distillation and software optimization, the audio generation time is shortened from 240 seconds to 8 seconds, and the efficiency is increased by 30 times.
This technology can be used on most smartphones equipped with Arm CPUs and will expand to more media creation in the future.