Monaural singing voice separation using fusion-net with time-frequency masking

Feng Li, Kaizhi Qian, Mark Hasegawa-Johnson, Masato Akagi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Monaural singing voice separation has received much attention in recent years. In this paper, we propose a novel neural network architecture for monaural singing voice separation, Fusion-Net, which is combining U-Net with the residual convolutional neural network to develop a much deeper neural network architecture with summation-based skip connections. In addition, we apply time-frequency masking to improve the separation results. Finally, we integrate the phase spectra with magnitude spectra as the post-processing to optimize the separated singing voice from the mixture music. Experimental results demonstrate that the proposed method can achieve better separation performance than the previous U-Net architecture on the ccMixter database.

Original languageEnglish (US)
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1239-1243
Number of pages5
ISBN (Electronic)9781728132488
DOIs
StatePublished - Nov 2019
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: Nov 18 2019Nov 21 2019

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
CountryChina
CityLanzhou
Period11/18/1911/21/19

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'Monaural singing voice separation using fusion-net with time-frequency masking'. Together they form a unique fingerprint.

Cite this