University of Washington AI technology lets headphone wearers pick specific sounds to hear
A team led by University of Washington (UofW) computer science researchers has created AI software for headphones that allow wearers to select specific sounds to hear. Unlike noise-cancelling headphones that simply filter out everything except voices, the new neural network allows users to select specific sounds such as the chirp of a bird.
Prior headphones such as the Sony INZONE buds (available on Amazon) use DSEE Extreme, Speak-to-Chat, and AI DNN AI technology to improve music and speech quality while automatically letting voices though noise cancelling when conversations begin. The UofW work advances upon this by allowing listeners to pick from 20 different types of sounds to hear, such as birds chirping, ocean, door knock, and toilet flush, while filtering out all else. Called semantic hearing, this allows users to enjoy the chirping of birds at a park without hearing people talk or cars motoring by.
Currently, the UofW app utilizes binaural microphones to capture the real-time position of external sounds before sending filtered sounds to headphones. Because this software runs on smartphones, their app can leverage more powerful CPUs than found in headphones, however, it is only a matter of time before noise-cancelling headphones come with semantic hearing built-in.
November 9, 2023
New AI noise-canceling headphone technology lets wearers pick which sounds they hear
Most anyone who’s used noise-canceling headphones knows that hearing the right noise at the right time can be vital. Someone might want to erase car horns when working indoors, but not when walking along busy streets. Yet people can’t choose what sounds their headphones cancel.
Now, a team led by researchers at the University of Washington has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. The team is calling the system “semantic hearing.” Headphones stream captured audio to a connected smartphone, which cancels all environmental sounds. Either through voice commands or a smartphone app, headphone wearers can select which sounds they want to include from 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps. Only the selected sounds will be played through the headphones.
The team presented its findings Nov. 1 at UIST ’23 in San Francisco. In the future, the researchers plan to release a commercial version of the system.
“Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise canceling headphones haven’t achieved,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “The challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.”
Because of this time crunch, the semantic hearing system must process sounds on a device such as a connected smartphone, instead of on more robust cloud servers. Additionally, because sounds from different directions arrive in people’s ears at different times, the system must preserve these delays and other spatial cues so people can still meaningfully perceive sounds in their environment.
Tested in environments such as offices, streets and parks, the system was able to extract sirens, bird chirps, alarms and other target sounds, while removing all other real-world noise. When 22 participants rated the system’s audio output for the target sound, they said that on average the quality improved compared to the original recording.In some cases, the system struggled to distinguish between sounds that share many properties, such as vocal music and human speech. The researchers note that training the models on more real-world data might improve these outcomes.
Additional co-authors on the paper were Bandhav Veluri and Malek Itani, both UW doctoral students in the Allen School; Justin Chan, who completed this research as a doctoral student in the Allen School and is now at Carnegie Mellon University; and Takuya Yoshioka, director of research at AssemblyAI.
For more information, contact [email protected].