TY - GEN
T1 - Robust speaker identification using a CASA front-end
AU - Zhao, Xiaojia
AU - Shao, Yang
AU - Wang, De Liang
PY - 2011
Y1 - 2011
N2 - Speaker recognition remains a challenging task under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We first show that a recently introduced speaker feature, Gammatone Frequency Cepstral Coefficient, performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by the CASA mask. Both methods are effective. We further combine them into a single system depending on the detected signal to noise ratio (SNR). This system achieves significant performance improvements over related systems under a wide range of SNR conditions.
AB - Speaker recognition remains a challenging task under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We first show that a recently introduced speaker feature, Gammatone Frequency Cepstral Coefficient, performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by the CASA mask. Both methods are effective. We further combine them into a single system depending on the detected signal to noise ratio (SNR). This system achieves significant performance improvements over related systems under a wide range of SNR conditions.
KW - CASA
KW - GFCC
KW - Robust speaker identification
KW - gammatone frequency cepstral coefficient
KW - ideal binary mask
UR - http://www.scopus.com/inward/record.url?scp=80051602840&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051602840&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947596
DO - 10.1109/ICASSP.2011.5947596
M3 - Conference contribution
AN - SCOPUS:80051602840
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5468
EP - 5471
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -