A spectral clustering approach to speaker diarization

Huazhong Ning, Ming Liu, Hao Tang, Thomas S Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a spectral clustering approach to explore the possibility of discovering structure from audio data. To apply the Ng-Jordan-Weiss (NJW) spectral clustering algorithm to speaker diarization, we propose some domain specific solutions to the open issues of this algorithm: choice of metric; selection of scaling parameter; estimation of the number of clusters. Then, a postprocessing step - "Cross EM refinement" - is conducted to further improve the performance of spectral learning. In experiments, this approach has performance very similar to the traditional hierarchical clustering on the audio data of Japanese Parliament Panel Discussions, but it runs much faster than the latter.

Original languageEnglish (US)
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages2178-2181
Number of pages4
ISBN (Print)9781604234497
StatePublished - Jan 1 2006
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: Sep 17 2006Sep 21 2006

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume5

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CountryUnited States
CityPittsburgh, PA
Period9/17/069/21/06

Keywords

  • BIC
  • Cross EM refinement
  • Speaker diarization
  • Spectral clustering

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'A spectral clustering approach to speaker diarization'. Together they form a unique fingerprint.

  • Cite this

    Ning, H., Liu, M., Tang, H., & Huang, T. S. (2006). A spectral clustering approach to speaker diarization. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (pp. 2178-2181). (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 5). International Speech Communication Association.