Detecting non-modal phonation in telephone speech

Tae Jin Yoon, Jennifer Cole, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Non-modal phonation conveys both linguistic and paralinguistic information, and is distinguished by acoustic source and filter features. Detecting non-modal phonation in speech requires reliable F0 analysis, a problem for telephone-band speech, where F0 analysis frequently fails. We demonstrate an approach to the detection of creaky phonation in telephone speech based on robust F0 and spectral analysis. Our F0 analysis relies on an autocorrelation algorithm applied to the intensity-boosted and inverse-filtered speech signal and succeeds in regions of nonmodal phonation where the non-filtered F0 analysis typically fails. In addition to the extracted F0 values, spectral amplitude is measured at the first two harmonics (H1, H2) and the first three formants (A1, A2, A3). Visual and spectral inspection of the detected creaky phonation confirms the findings reported from laboratory setting. Statistical analysis using oneway ANOVA and classification using Support Vector Machine (SVM) reveals promising results which lead to further improvement for automatic detection of non-modal phonation in telephone speech.

Original languageEnglish (US)
Title of host publicationProceedings of the 4th International Conference on Speech Prosody, SP 2008
PublisherInternational Speech Communication Association
Number of pages4
ISBN (Print)9780616220030
StatePublished - 2008
Event4th International Conference on Speech Prosody 2008, SP 2008 - Campinas, Brazil
Duration: May 6 2008May 9 2008

Publication series

NameProceedings of the 4th International Conference on Speech Prosody, SP 2008


Other4th International Conference on Speech Prosody 2008, SP 2008

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Software
  • Mechanical Engineering


Dive into the research topics of 'Detecting non-modal phonation in telephone speech'. Together they form a unique fingerprint.

Cite this