A framework for evaluating multimodal music mood classification

Xiao Hu, Kahyun Choi, J. Stephen Downie

Research output: Contribution to journalArticlepeer-review


This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

Original languageEnglish (US)
Pages (from-to)273-285
Number of pages13
JournalJournal of the Association for Information Science and Technology
Issue number2
StatePublished - Feb 1 2017


  • automatic categorization
  • music
  • text processing

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences


Dive into the research topics of 'A framework for evaluating multimodal music mood classification'. Together they form a unique fingerprint.

Cite this