End-To-End Source Separation with Adaptive Front-Ends

Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We develop an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose and interpret a novel source to distortion ratio based cost function for end-to-end source separation.

Original languageEnglish (US)
Title of host publicationConference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Pages684-688
Number of pages5
ISBN (Electronic)9781538692189
DOIs
StatePublished - Feb 19 2019
Event52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018 - Pacific Grove, United States
Duration: Oct 28 2018Oct 31 2018

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2018-October
ISSN (Print)1058-6393

Conference

Conference52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018
CountryUnited States
CityPacific Grove
Period10/28/1810/31/18

Fingerprint

Source separation
Neural networks
Inverse transforms
Cost functions
Learning systems
Fourier transforms

Keywords

  • Auto-encoders
  • adaptive transforms
  • deep learning
  • source separation

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications

Cite this

Venkataramani, S., Casebeer, J., & Smaragdis, P. (2019). End-To-End Source Separation with Adaptive Front-Ends. In M. B. Matthews (Ed.), Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018 (pp. 684-688). [8645535] (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2018-October). IEEE Computer Society. https://doi.org/10.1109/ACSSC.2018.8645535

End-To-End Source Separation with Adaptive Front-Ends. / Venkataramani, Shrikant; Casebeer, Jonah; Smaragdis, Paris.

Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018. ed. / Michael B. Matthews. IEEE Computer Society, 2019. p. 684-688 8645535 (Conference Record - Asilomar Conference on Signals, Systems and Computers; Vol. 2018-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Venkataramani, S, Casebeer, J & Smaragdis, P 2019, End-To-End Source Separation with Adaptive Front-Ends. in MB Matthews (ed.), Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018., 8645535, Conference Record - Asilomar Conference on Signals, Systems and Computers, vol. 2018-October, IEEE Computer Society, pp. 684-688, 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018, Pacific Grove, United States, 10/28/18. https://doi.org/10.1109/ACSSC.2018.8645535
Venkataramani S, Casebeer J, Smaragdis P. End-To-End Source Separation with Adaptive Front-Ends. In Matthews MB, editor, Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018. IEEE Computer Society. 2019. p. 684-688. 8645535. (Conference Record - Asilomar Conference on Signals, Systems and Computers). https://doi.org/10.1109/ACSSC.2018.8645535
Venkataramani, Shrikant ; Casebeer, Jonah ; Smaragdis, Paris. / End-To-End Source Separation with Adaptive Front-Ends. Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018. editor / Michael B. Matthews. IEEE Computer Society, 2019. pp. 684-688 (Conference Record - Asilomar Conference on Signals, Systems and Computers).
@inproceedings{070745caf4124372b4cc647adfd2faec,
title = "End-To-End Source Separation with Adaptive Front-Ends",
abstract = "Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We develop an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose and interpret a novel source to distortion ratio based cost function for end-to-end source separation.",
keywords = "Auto-encoders, adaptive transforms, deep learning, source separation",
author = "Shrikant Venkataramani and Jonah Casebeer and Paris Smaragdis",
year = "2019",
month = "2",
day = "19",
doi = "10.1109/ACSSC.2018.8645535",
language = "English (US)",
series = "Conference Record - Asilomar Conference on Signals, Systems and Computers",
publisher = "IEEE Computer Society",
pages = "684--688",
editor = "Matthews, {Michael B.}",
booktitle = "Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018",

}

TY - GEN

T1 - End-To-End Source Separation with Adaptive Front-Ends

AU - Venkataramani, Shrikant

AU - Casebeer, Jonah

AU - Smaragdis, Paris

PY - 2019/2/19

Y1 - 2019/2/19

N2 - Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We develop an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose and interpret a novel source to distortion ratio based cost function for end-to-end source separation.

AB - Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We develop an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose and interpret a novel source to distortion ratio based cost function for end-to-end source separation.

KW - Auto-encoders

KW - adaptive transforms

KW - deep learning

KW - source separation

UR - http://www.scopus.com/inward/record.url?scp=85063006584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063006584&partnerID=8YFLogxK

U2 - 10.1109/ACSSC.2018.8645535

DO - 10.1109/ACSSC.2018.8645535

M3 - Conference contribution

AN - SCOPUS:85063006584

T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers

SP - 684

EP - 688

BT - Conference Record of the 52nd Asilomar Conference on Signals, Systems and Computers, ACSSC 2018

A2 - Matthews, Michael B.

PB - IEEE Computer Society

ER -