Detecting depression severity from vocal prosody

Ying Yang, Catherine Fairbairn, Jeffrey F. Cohn

Research output: Contribution to journalArticlepeer-review


To investigate the relation between vocal prosody and change in depression severity over time, 57 participants from a clinical trial for treatment of depression were evaluated at seven-week intervals using a semistructured clinical interview for depression severity (Hamilton Rating Scale for Depression (HRSD)). All participants met criteria for major depressive disorder (MDD) at week one. Using both perceptual judgments by naive listeners and quantitative analyses of vocal timing and fundamental frequency, three hypotheses were tested: 1) Naive listeners can perceive the severity of depression from vocal recordings of depressed participants and interviewers. 2) Quantitative features of vocal prosody in depressed participants reveal change in symptom severity over the course of depression. 3) Interpersonal effects occur as well; such that vocal prosody in interviewers shows corresponding effects. These hypotheses were strongly supported. Together, participants' and interviewers' vocal prosody accounted for about 60 percent of variation in depression scores, and detected ordinal range of depression severity (low, mild, and moderate-to-severe) in 69 percent of cases (kappa (= 0.53)). These findings suggest that analysis of vocal prosody could be a powerful tool to assist in depression screening and monitoring over the course of depressive disorder and recovery.

Original languageEnglish (US)
Article number6365169
Pages (from-to)142-150
Number of pages9
JournalIEEE Transactions on Affective Computing
Issue number2
StatePublished - 2013
Externally publishedYes


  • Prosody
  • depression
  • hierarchical linear modeling (HLM)
  • interpersonal influence
  • switching pause
  • vocal fundamental frequency

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction


Dive into the research topics of 'Detecting depression severity from vocal prosody'. Together they form a unique fingerprint.

Cite this