TY - GEN
T1 - A computational auditory scene analysis system for robust speech recognition
AU - Srinivasan, Soundararajan
AU - Shao, Yang
AU - Jin, Zhaozhang
AU - Wang, De Liang
PY - 2006
Y1 - 2006
N2 - We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicky to segre-gate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting T-F masks are used in conjunction with missing-data methods for recognition. Systematic evaluations on a speech separation challenge task show significant improvement over the baseline performance.
AB - We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicky to segre-gate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting T-F masks are used in conjunction with missing-data methods for recognition. Systematic evaluations on a speech separation challenge task show significant improvement over the baseline performance.
KW - Binary time-frequency mask
KW - Computational auditory scene analysis
KW - Robust speech recognition
KW - Speech segregation
UR - http://www.scopus.com/inward/record.url?scp=40749137520&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=40749137520&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:40749137520
SN - 9781604234497
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 73
EP - 76
BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PB - International Speech Communication Association
T2 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Y2 - 17 September 2006 through 21 September 2006
ER -