TY - GEN
T1 - SALSA-LITE
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
AU - Nguyen, Thi Ngoc Tho
AU - Jones, Douglas L.
AU - Watcharasupat, Karn N.
AU - Phan, Huy
AU - Gan, Woon Seng
N1 - This research was supported by the Singapore Ministry of Education Academic Research Fund Tier-2, under research grant MOE2017-T2-2-060, and the Google Cloud Research Credits program with the award GCP205311440. K. N. Watcharasupat acknowledges the support from the CN Yang Scholars Programme, NTU.
PY - 2022
Y1 - 2022
N2 - Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightweight variation of a previously proposed SALSA feature for polyphonic SELD. SALSA, which stands for Spatial Cue-Augmented Log-Spectrogram, consists of multichannel log-spectrograms stacked channelwise with the normalized principal eigenvectors of the spectrotemporally corresponding spatial covariance matrices. In contrast to SALSA, which uses eigenvector-based spatial features, SALSA-Lite uses normalized inter-channel phase differences as spatial features, allowing a 30-fold speedup compared to the original SALSA feature. Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset showed that the SALSA-Lite feature achieved competitive performance compared to the full SALSA feature, and significantly outperformed the traditional feature set of multichannel log-mel spectrograms with generalized cross-correlation spectra. Specifically, using SALSA-Lite features increased localization-dependent F1 score and class-dependent localization recall by 15 % and 5 %, respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra.
AB - Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightweight variation of a previously proposed SALSA feature for polyphonic SELD. SALSA, which stands for Spatial Cue-Augmented Log-Spectrogram, consists of multichannel log-spectrograms stacked channelwise with the normalized principal eigenvectors of the spectrotemporally corresponding spatial covariance matrices. In contrast to SALSA, which uses eigenvector-based spatial features, SALSA-Lite uses normalized inter-channel phase differences as spatial features, allowing a 30-fold speedup compared to the original SALSA feature. Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset showed that the SALSA-Lite feature achieved competitive performance compared to the full SALSA feature, and significantly outperformed the traditional feature set of multichannel log-mel spectrograms with generalized cross-correlation spectra. Specifically, using SALSA-Lite features increased localization-dependent F1 score and class-dependent localization recall by 15 % and 5 %, respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra.
KW - Feature extraction
KW - detection
KW - microphone array
KW - sound event localization
UR - http://www.scopus.com/inward/record.url?scp=85131234113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131234113&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746132
DO - 10.1109/ICASSP43922.2022.9746132
M3 - Conference contribution
AN - SCOPUS:85131234113
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 716
EP - 720
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 May 2022 through 27 May 2022
ER -