A framework for evaluating multimodal music mood classification

Xiao Hu, Kahyun Choi, J. Stephen Downie

Research output: Contribution to journalArticle

Abstract

This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

Original languageEnglish (US)
Pages (from-to)273-285
Number of pages13
JournalJournal of the Association for Information Science and Technology
Volume68
Issue number2
DOIs
StatePublished - Feb 1 2017

Fingerprint

mood
music
Audio systems
Hybrid systems
Feature extraction
performance comparison
examination
evaluation
learning
Mood
Music
Relative importance
Feature selection
Evaluation
Tag
Information sources
Learning curve
Multiple use

Keywords

  • automatic categorization
  • music
  • text processing

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

Cite this

A framework for evaluating multimodal music mood classification. / Hu, Xiao; Choi, Kahyun; Downie, J. Stephen.

In: Journal of the Association for Information Science and Technology, Vol. 68, No. 2, 01.02.2017, p. 273-285.

Research output: Contribution to journalArticle

@article{348891ab4a8a47a3af9d64d93aa7d4a5,
title = "A framework for evaluating multimodal music mood classification",
abstract = "This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.",
keywords = "automatic categorization, music, text processing",
author = "Xiao Hu and Kahyun Choi and Downie, {J. Stephen}",
year = "2017",
month = "2",
day = "1",
doi = "10.1002/asi.23649",
language = "English (US)",
volume = "68",
pages = "273--285",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "2",

}

TY - JOUR

T1 - A framework for evaluating multimodal music mood classification

AU - Hu, Xiao

AU - Choi, Kahyun

AU - Downie, J. Stephen

PY - 2017/2/1

Y1 - 2017/2/1

N2 - This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

AB - This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

KW - automatic categorization

KW - music

KW - text processing

UR - http://www.scopus.com/inward/record.url?scp=85010903991&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010903991&partnerID=8YFLogxK

U2 - 10.1002/asi.23649

DO - 10.1002/asi.23649

M3 - Article

AN - SCOPUS:85010903991

VL - 68

SP - 273

EP - 285

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 2

ER -