Sanna Wager
Heading out to enjoy some karaoke with some friends? Researchers at the School of Informatics, Computing, and Engineering have your back.
Assistant Professor of Intelligent Systems Engineering Minje Kim and Informatics Ph.D. candidate Sanna Wager are using artificial intelligence and data collected by the Karaoke app Smule to develop a system that can automatically tune off-key vocal performances to improve the final result. By feeding nearly 5,000 samples into an AI system, Wager and her group used a deep learning system that tuned singer’s notes while also maintaining the individual style of the singer.
Using traditional auto-tuning, the correct pitches in the melody are provided, and the singer’s voice is snapped to those pitches. The effect can sound robotic and be musically limiting: Singers often sing off key on purpose. Although the effect sounds great, pitch correction remove nuances in a singer’s voice, treating them as incorrect. Wager’s algorithm polishes the tones without completely correcting them, leaving more of the singer’s style intact. The process can be compared to a spell checker in a text editor.
“Sanna had been interested in working with Smule, and our mutual interest in the topic provided a good project for a summer internship,” Kim said. “They had the perfect large-scale dataset we needed for the algorithm we developed. They have the thousands of quality singing voice tracks that we needed, and she was able to train her AI model using their big real-world dataset as well as to use the technical advice from their research team to refine the algorithm.”
The tool can also be used to help singers develop their ear for proper pitch and build confidence in sharing their music with others. Also, since the system doesn’t perfect but rather improves the singing, it can help new singers realize the role that audio processing plays when it comes to professionally recorded songs and reinforce that a perfect voice isn’t necessary for singing to be enjoyable.
In its current state, the system can only process a recording after the fact, but the hope is to develop a system that can improve a singer’s effort in real-time. The research also goes beyond just karaoke and into a larger effort when it comes to AI.
“This is the beginning of our Signals and AI Group in Engineering’s (SAIGE) larger-scale research aiming at helping people’s creative activities using AI technologies,” Kim said. “In this instance, we’re using AI to convert a musical signal into something that’s more pleasant to listen to. Whether AI can learn creativity or not is a very interesting research area these days. We don’t mean to replace a human’s creative activities with AI technology but develop one that can help people create better music. We believe that a deep learning-based automated solution to this kind of task is the right direction to maintain the subtlety of human voice and the expressive gesture.
The research has caught the eye of multiple media outlets, including The Times, The Daily Mail, New Scientist magazine, BBC radio, and others.
“I’m very happy that my group’s research on the nuances in musical intonation provoked interest in a broader audience,” Wager said. “This motivates me to keep working on technology that provides people with more opportunities to develop as musicians.”