Performance Based Cost Functions for End-to-End Speech Separation

Shrikant Venkataramani, Ryley Higa, Paris Smaragdis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent neural network strategies for source separation attempt to model audio signals by processing their waveforms directly. Mean squared error (MSE) that measures the Euclidean distance between waveforms of denoised speech and the ground-truth speech, has been a natural cost-function for these approaches. However, MSE is not a perceptually motivated measure and may result in large perceptual discrepancies. In this paper, we propose and experiment with new loss functions for end-to-end source separation. These loss functions are motivated by BSS-Eval and perceptual metrics like source to distortion ratio (SDR), source to interference ratio (SIR), source to artifact ratio (SAR) and short-time objective intelligibility ratio (STOI). This enables the flexibility to mix and match these loss functions depending upon the requirements of the task. Subjective listening tests reveal that combinations of the proposed cost functions help achieve superior separation performance as compared to stand-alone MSE and SDR costs.

Original languageEnglish (US)
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages350-355
Number of pages6
ISBN (Electronic)9789881476852
DOIs
StatePublished - Mar 4 2019
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: Nov 12 2018Nov 15 2018

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
CountryUnited States
CityHonolulu
Period11/12/1811/15/18

    Fingerprint

Keywords

  • Cost functions
  • Deep learning
  • End-to-end speech separation

ASJC Scopus subject areas

  • Information Systems

Cite this

Venkataramani, S., Higa, R., & Smaragdis, P. (2019). Performance Based Cost Functions for End-to-End Speech Separation. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings (pp. 350-355). [8659758] (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/APSIPA.2018.8659758