Singing-voice separation from monaural recordings using deep recurrent neural networks

Po Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

Research output: Contribution to conferencePaperpeer-review

Abstract

Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Different discriminative training objectives are further explored to enhance the source to interference ratio. Our proposed system achieves the state-of-the-art performance, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models, on the MIR-1K dataset.

Original languageEnglish (US)
Pages477-482
Number of pages6
StatePublished - Jan 1 2014
Event15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
Duration: Oct 27 2014Oct 31 2014

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
CountryTaiwan, Province of China
CityTaipei
Period10/27/1410/31/14

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint Dive into the research topics of 'Singing-voice separation from monaural recordings using deep recurrent neural networks'. Together they form a unique fingerprint.

Cite this