Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation

Sunwoo Kim, Minje Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In realistic speech enhancement settings for end-user devices, we often encounter only a few speakers and noise types that tend to reoccur in the specific acoustic environment. We propose a novel personalized speech enhancement method to adapt a compact denoising model to the test-time specificity. Our goal in this test-time adaptation is to utilize no clean speech target of the test speaker, thus fulfilling the requirement for zero-shot learning. To complement the lack of clean speech, we employ the knowledge distillation framework: we distill the more advanced denoising results from an overly large teacher model, and use them as the pseudo target to train the small student model. This zero-shot learning procedure circumvents the process of collecting users' clean speech, a process that users are reluctant to comply due to privacy concerns and technical difficulty of recording clean voice. Experiments on various test-time conditions show that the proposed personalization method can significantly improve the compact models' performance during the test time. Furthermore, since the personalized models outperform larger non-personalized baseline models, we claim that personalization achieves model compression with no loss of denoising performance. As expected, the student models underperform the state-of-the-art teacher models.

Original languageEnglish (US)
Title of host publication2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages176-180
Number of pages5
ISBN (Electronic)9781665448703
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 - New Paltz, United States
Duration: Oct 17 2021Oct 20 2021

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2021-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
Country/TerritoryUnited States
CityNew Paltz
Period10/17/2110/20/21

Keywords

  • knowledge distillation
  • model compression
  • personalization
  • Speech enhancement
  • zero-shot learning

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation'. Together they form a unique fingerprint.

Cite this