End-To-End Source Separation with Adaptive Front-Ends

Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We develop an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose and interpret a novel source to distortion ratio based cost function for end-to-end source separation.

Original languageEnglish (US)
Title of host publicationConference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Number of pages5
ISBN (Electronic)9781538692189
StatePublished - Jul 2 2018
Event52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018 - Pacific Grove, United States
Duration: Oct 28 2018Oct 31 2018

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
ISSN (Print)1058-6393


Conference52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018
Country/TerritoryUnited States
CityPacific Grove


  • Auto-encoders
  • adaptive transforms
  • deep learning
  • source separation

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications


Dive into the research topics of 'End-To-End Source Separation with Adaptive Front-Ends'. Together they form a unique fingerprint.

Cite this