Singing-voice separation from monaural recordings using deep recurrent neural networks

Research output: Contribution to conferencePaper

Abstract

Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Different discriminative training objectives are further explored to enhance the source to interference ratio. Our proposed system achieves the state-of-the-art performance, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models, on the MIR-1K dataset.

Original languageEnglish (US)
Pages477-482
Number of pages6
StatePublished - Jan 1 2014
Event15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
Duration: Oct 27 2014Oct 31 2014

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
CountryTaiwan, Province of China
CityTaipei
Period10/27/1410/31/14

Fingerprint

Recurrent neural networks
Source separation
Recurrent Neural Networks

ASJC Scopus subject areas

  • Music
  • Information Systems

Cite this

Huang, P. S., Kim, M., Hasegawa-Johnson, M. A., & Smaragdis, P. (2014). Singing-voice separation from monaural recordings using deep recurrent neural networks. 477-482. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

Singing-voice separation from monaural recordings using deep recurrent neural networks. / Huang, Po Sen; Kim, Minje; Hasegawa-Johnson, Mark Allan; Smaragdis, Paris.

2014. 477-482 Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.

Research output: Contribution to conferencePaper

Huang, PS, Kim, M, Hasegawa-Johnson, MA & Smaragdis, P 2014, 'Singing-voice separation from monaural recordings using deep recurrent neural networks' Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China, 10/27/14 - 10/31/14, pp. 477-482.
Huang PS, Kim M, Hasegawa-Johnson MA, Smaragdis P. Singing-voice separation from monaural recordings using deep recurrent neural networks. 2014. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.
Huang, Po Sen ; Kim, Minje ; Hasegawa-Johnson, Mark Allan ; Smaragdis, Paris. / Singing-voice separation from monaural recordings using deep recurrent neural networks. Paper presented at 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, Province of China.6 p.
@conference{4f2f549db36e449886a770736501cdec,
title = "Singing-voice separation from monaural recordings using deep recurrent neural networks",
abstract = "Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Different discriminative training objectives are further explored to enhance the source to interference ratio. Our proposed system achieves the state-of-the-art performance, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models, on the MIR-1K dataset.",
author = "Huang, {Po Sen} and Minje Kim and Hasegawa-Johnson, {Mark Allan} and Paris Smaragdis",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
pages = "477--482",
note = "15th International Society for Music Information Retrieval Conference, ISMIR 2014 ; Conference date: 27-10-2014 Through 31-10-2014",

}

TY - CONF

T1 - Singing-voice separation from monaural recordings using deep recurrent neural networks

AU - Huang, Po Sen

AU - Kim, Minje

AU - Hasegawa-Johnson, Mark Allan

AU - Smaragdis, Paris

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Different discriminative training objectives are further explored to enhance the source to interference ratio. Our proposed system achieves the state-of-the-art performance, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models, on the MIR-1K dataset.

AB - Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Different discriminative training objectives are further explored to enhance the source to interference ratio. Our proposed system achieves the state-of-the-art performance, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models, on the MIR-1K dataset.

UR - http://www.scopus.com/inward/record.url?scp=85046988721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046988721&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85046988721

SP - 477

EP - 482

ER -