Topic Modeling Users' Interpretations of Songs to Inform Subject Access in Music Digital Libraries

Kahyun Choi, Jin Ha Lee, Craig Willis, J. Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The assignment of subject metadata to music is useful for organizing and accessing digital music collections. Since manual subject annotation of large-scale music collections is labor-intensive, automatic methods are preferred. Topic modeling algorithms can be used to automatically identify latent topics from appropriate text sources. Candidate text sources such as song lyrics are often too poetic, resulting in lower-quality topics. Users' interpretations of song lyrics provide an alternative source. In this paper, we propose an automatic topic discovery system from web-mined user-generated interpretations of songs to provide subject access to a music digital library. We also propose and evaluate filtering techniques to identify high-quality topics. In our experiments, we use 24,436 popular songs that exist in both the Million Song Dataset and Topic models are generated using Latent Dirichlet Allocation (LDA). To evaluate the coherence of learned topics, we calculate the Normalized Pointwise Mutual Information (NPMI) of the top ten words in each topic based on occurrences in Wikipedia. Finally, we evaluate the resulting topics using a subset of 422 songs that have been manually assigned to six subjects. Using this system, 71% of the manually assigned subjects were correctly identified. These results demonstrate that topic modeling of song interpretations is a promising method for subject metadata enrichment in music digital libraries. It also has implications for affording similar access to collections of poetry and fiction.

Original languageEnglish (US)
Title of host publicationJCDL 2015 - Proceedings of the 15th ACM/IEEE-CE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages4
ISBN (Electronic)9781450335942
StatePublished - Jun 21 2015
Event15th ACM/IEEE-CE Joint Conference on Digital Libraries, JCDL 2015 - Knoxville, United States
Duration: Jun 21 2015Jun 25 2015

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996


Other15th ACM/IEEE-CE Joint Conference on Digital Libraries, JCDL 2015
Country/TerritoryUnited States


  • interpretations of lyrics
  • music digital library
  • topic models

ASJC Scopus subject areas

  • General Engineering


Dive into the research topics of 'Topic Modeling Users' Interpretations of Songs to Inform Subject Access in Music Digital Libraries'. Together they form a unique fingerprint.

Cite this