Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results

Kahyun Choi, J Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.

Original languageEnglish (US)
Title of host publicationJCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages327-328
Number of pages2
ISBN (Electronic)9781450351782
DOIs
StatePublished - May 23 2018
Event18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 - Fort Worth, United States
Duration: Jun 3 2018Jun 7 2018

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018
CountryUnited States
CityFort Worth
Period6/3/186/7/18

Fingerprint

Semantics
Big data

Keywords

  • classification
  • metadata
  • song lyrics
  • subject
  • topic
  • word embedding

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Choi, K., & Downie, J. S. (2018). Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results. In JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (pp. 327-328). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3197026.3203883

Exploratory Investigation of Word Embedding in Song Lyric Topic Classification : Promising Preliminary Results. / Choi, Kahyun; Downie, J Stephen.

JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. p. 327-328 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Choi, K & Downie, JS 2018, Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results. in JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Institute of Electrical and Electronics Engineers Inc., pp. 327-328, 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018, Fort Worth, United States, 6/3/18. https://doi.org/10.1145/3197026.3203883
Choi K, Downie JS. Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results. In JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. 2018. p. 327-328. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1145/3197026.3203883
Choi, Kahyun ; Downie, J Stephen. / Exploratory Investigation of Word Embedding in Song Lyric Topic Classification : Promising Preliminary Results. JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 327-328 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
@inproceedings{79a014461bb0423eab740d57985d8232,
title = "Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results",
abstract = "In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57{\%}. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.",
keywords = "classification, metadata, song lyrics, subject, topic, word embedding",
author = "Kahyun Choi and Downie, {J Stephen}",
year = "2018",
month = "5",
day = "23",
doi = "10.1145/3197026.3203883",
language = "English (US)",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "327--328",
booktitle = "JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries",
address = "United States",

}

TY - GEN

T1 - Exploratory Investigation of Word Embedding in Song Lyric Topic Classification

T2 - Promising Preliminary Results

AU - Choi, Kahyun

AU - Downie, J Stephen

PY - 2018/5/23

Y1 - 2018/5/23

N2 - In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.

AB - In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.

KW - classification

KW - metadata

KW - song lyrics

KW - subject

KW - topic

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=85048891564&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048891564&partnerID=8YFLogxK

U2 - 10.1145/3197026.3203883

DO - 10.1145/3197026.3203883

M3 - Conference contribution

AN - SCOPUS:85048891564

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 327

EP - 328

BT - JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries

PB - Institute of Electrical and Electronics Engineers Inc.

ER -