SCALABLE AND EFFICIENT SPEECH ENHANCEMENT USING MODIFIED COLD DIFFUSION: A RESIDUAL LEARNING APPROACH

Minje Kim, Trausti Kristjansson

Research output: Contribution to journalConference articlepeer-review

Abstract

We introduce flexibility to the supervised learning-based speech enhancement framework to achieve scalable and efficient speech enhancement (SESE). To this end, SESE conducts a series of segmented speech enhancement inference routines, each of which incrementally improves the result of its preceding inference. The formulation is conceptually similar to cold diffusion, while we modify the sampling process so each step benefits from an easier milestone task rather than aggressively targeting the clean speech. In addition, the incremental enhancement steps are learned to recover the residual between the adjacent milestones, thus improving the overall enhancement performance. We show that the proposed method improves the baseline supervised model’s performance, while it necessitates fewer diffusion steps to achieve the comparable performance with the more complex cold diffusion-based counterpart. Furthermore, SESE’s scalability can be useful in applications where moderately suppressed non-speech interference is preferred to aggressive enhancement results, e.g., boosting dialog in movie soundtracks, speech enhancement on hearing aids, etc.

Original languageEnglish (US)
Pages (from-to)1216-1220
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: Apr 14 2024Apr 19 2024

Keywords

  • cold diffusion
  • model compression
  • scalability
  • Speech enhancement

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'SCALABLE AND EFFICIENT SPEECH ENHANCEMENT USING MODIFIED COLD DIFFUSION: A RESIDUAL LEARNING APPROACH'. Together they form a unique fingerprint.

Cite this