Stability AI, the startup known for its AI-based image generator Stable Diffusion, has unveiled an open-source AI model for creating sounds and music compositions, reportedly trained exclusively on free recordings.
The new generative model, named Stable Audio Open, accepts textual descriptions (e.g., "Rock beat played in a professional studio using an acoustic drum kit") and generates audio recordings up to 47 seconds long. The model was trained on approximately 486,000 samples from the FreeSound and Free Music Archive libraries.
According to Stability AI, the model can be used to create drum beats, instrumental riffs, ambient noises, and "production elements" for videos, films, and TV shows. It can also "edit" existing compositions or apply the style of one song to another (e.g., blending smooth jazz with another melody).
"The key advantage of this open-source version is that users can fine-tune the model based on their own audio data," notes Stability AI in its corporate blog. "For instance, a drummer can adjust the model based on their own recordings to create new beats."
However, Stable Audio Open has its limitations. It cannot create full-fledged songs, melodies, or vocals at a high quality level. Stability AI states that the model is not optimized for these tasks and suggests users requiring these features to use the premium Stable Audio service.
Additionally, Stable Audio Open is not intended for commercial use; its terms of service prohibit this. The model also does not perform equally well with different musical styles and cultures, or with descriptions in languages other than English, due to the limitations of its training data.
"The data source may lack diversity, and not all cultures are equally represented in the dataset," writes Stability AI in the model description. "The samples generated by the model will reflect these training data limitations."
Stability AI, which has long struggled to stabilize its business, recently found itself in the spotlight due to disagreements over training AI models on copyrighted works. The company's vice president of generative audio, Ed Newton-Rex, resigned in disagreement with the company's stance on "fair use" of such works. The release of Stable Audio Open is likely an attempt by Stability AI to improve its reputation while subtly promoting its paid products.
The rise in popularity of music generators, such as those from Stability AI, is drawing attention to copyright issues. In May, Sony Music, representing artists like Billy Joel, Doja Cat, and Lil Nas X, issued a warning to 700 AI companies against unauthorized use of its content for training sound generators. In March, Tennessee passed the first law in the US aimed at preventing AI misuse in music.