TY - JOUR
T1 - Cooperative Speech Separation With a Microphone Array and Asynchronous Wearable Devices
AU - Corey, Ryan M.
AU - Mittal, Manan
AU - Sarkar, Kanad
AU - Singer, Andrew C.
N1 - This research was supported by the National Science Foundation under Grant No. 1919257 and by an appointment to the Intelligence Community Postdoctoral Research Fellowship Program at the University of Illinois Urbana-Champaign, administered by Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the Office of the Director of National Intelligence.
PY - 2022
Y1 - 2022
N2 - We consider the problem of separating speech from several talkers in background noise using a fixed microphone array and a set of wearable devices. Wearable devices can provide reliable information about speech from their wearers, but they typically cannot be used directly for multichannel source separation due to network delay, sample rate offsets, and relative motion. Instead, the wearable microphone signals are used to compute the speech presence probability for each talker at each time-frequency index. Those parameters, which are robust against small sample rate offsets and relative motion, are used to track the second-order statistics of the speech sources and background noise. The fixed array then separates the speech signals using an adaptive linear time-varying multichannel Wiener filter. The proposed method is demonstrated using real-room recordings from three human talkers with binaural earbud microphones and an eight-microphone tabletop array.
AB - We consider the problem of separating speech from several talkers in background noise using a fixed microphone array and a set of wearable devices. Wearable devices can provide reliable information about speech from their wearers, but they typically cannot be used directly for multichannel source separation due to network delay, sample rate offsets, and relative motion. Instead, the wearable microphone signals are used to compute the speech presence probability for each talker at each time-frequency index. Those parameters, which are robust against small sample rate offsets and relative motion, are used to track the second-order statistics of the speech sources and background noise. The fixed array then separates the speech signals using an adaptive linear time-varying multichannel Wiener filter. The proposed method is demonstrated using real-room recordings from three human talkers with binaural earbud microphones and an eight-microphone tabletop array.
KW - asynchronous microphone array
KW - distributed microphone array
KW - speech separation
KW - wearable devices
UR - http://www.scopus.com/inward/record.url?scp=85140078657&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140078657&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-11025
DO - 10.21437/Interspeech.2022-11025
M3 - Conference article
AN - SCOPUS:85140078657
SN - 2308-457X
VL - 2022-September
SP - 5398
EP - 5402
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -