Abstract

In recent years, designing the coding and pooling structures in layered networks has been shown to be a useful method for learning high-level feature representations for visual data. Yet, such learning structures have not been extensively studied for audio signals. In this paper, we investigate different pooling strategies based on the sparse coding scheme and propose a temporal pyramid pooling method to extract discriminative and shift invariant feature representations. We demonstrate the superiority of our new feature representation over traditional features on the acoustic event classification task.

Original languageEnglish (US)
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages2517-2520
Number of pages4
StatePublished - 2012
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: Sep 9 2012Sep 13 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume3

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/TerritoryUnited States
CityPortland, OR
Period9/9/129/13/12

Keywords

  • Acoustic event classification
  • Pooling
  • Sparse coding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Fingerprint

Dive into the research topics of 'Pooling robust shift-invariant sparse representations of acoustic signals'. Together they form a unique fingerprint.

Cite this