TY - GEN
T1 - Point Cloud Audio Processing
AU - Subramani, Krishna
AU - Smaragdis, Paris
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.
AB - Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.
KW - Point Clouds
KW - Sample Rate Invariance
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85123423237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123423237&partnerID=8YFLogxK
U2 - 10.1109/WASPAA52581.2021.9632668
DO - 10.1109/WASPAA52581.2021.9632668
M3 - Conference contribution
AN - SCOPUS:85123423237
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 31
EP - 35
BT - 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021
Y2 - 17 October 2021 through 20 October 2021
ER -