Downcodes editor reports: Adobe and the University of Michigan have collaborated to develop an AI sound effects generation system called MultiFoley, which can generate dubbing sound effects in movies and videos through text prompts, audio or video examples, greatly improving post-production efficiency. The system supports multiple input methods and can convert different sounds, such as a cat's meow into a lion's roar. Its high-bandwidth audio output quality and precise video synchronization have received extremely high ratings in user tests.
Recently, the Adobe research team and researchers from the University of Michigan jointly developed an artificial intelligence system called MultiFoley. This system can generate dubbing sound effects in movies and videos to assist post-production.
MultiFoley is innovative in that it allows users to create sound effects through text prompts, reference audio or video examples. In demonstrations, the system was even able to convert a cat's meow into a lion's roar, or typewriter sounds into piano notes, all in perfect sync with the video footage.
MultiFoley's audio output quality reaches a high bandwidth of 48kHz, which is mainly due to the researchers' use of videos and professional sound effects libraries on the Internet for training. Unlike previous systems, MultiFoley integrates multiple input methods - text, audio and video references - into the same model for the first time. It works by analyzing visual characteristics at 8 frames per second and amplifying them to match the 40Hz audio sampling rate, ensuring that the generated audio remains tightly synchronized with the video.

In tests, MultiFoley performed well in synchronizing audio and video and matching sound effects with text descriptions, with an average synchronization accuracy of 0.8 seconds, significantly better than the typical delay of more than one second in traditional systems. User research showed that 85.8% of participants rated MultiFoley superior to the second best in terms of semantic consistency, while 94.5% preferred its synchronization effect.

Although MultiFoley has shown strong potential, the research team also pointed out some current limitations, such as the relatively small training data, which limits the variety of sound effects it can use. At the same time, the system also has certain difficulties in generating multiple simultaneous sound effects. The research team plans to release the source code and model soon.
Although Adobe has not announced the inclusion of MultiFoley into its products, the technology fits well with the existing artificial intelligence capabilities in Adobe Premiere Pro video editing software and is expected to bring convenience to individual creators and production companies in the sound design process.
Highlight:
? MultiFoley is an AI sound effect generation system jointly developed by Adobe and the University of Michigan. It can generate sound effects through a variety of input methods.
? The audio output quality of this system reaches 48kHz, and the average synchronization accuracy is 0.8 seconds, which is better than traditional sound effects systems.
User studies show that MultiFoley receives high ratings for both the semantic consistency and synchronization of sound effects.
All in all, the emergence of MultiFoley has brought new possibilities to sound effects production, and its efficient and precise performance and convenient operation are expected to change the future sound effects production process. Let us look forward to the release of its source code and models, as well as its application in Adobe products.