Abstract
In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.
Original language | English (US) |
---|---|
Title of host publication | JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 327-328 |
Number of pages | 2 |
ISBN (Electronic) | 9781450351782 |
DOIs | |
State | Published - May 23 2018 |
Event | 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 - Fort Worth, United States Duration: Jun 3 2018 → Jun 7 2018 |
Publication series
Name | Proceedings of the ACM/IEEE Joint Conference on Digital Libraries |
---|---|
ISSN (Print) | 1552-5996 |
Other
Other | 18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 |
---|---|
Country | United States |
City | Fort Worth |
Period | 6/3/18 → 6/7/18 |
Fingerprint
Keywords
- classification
- metadata
- song lyrics
- subject
- topic
- word embedding
ASJC Scopus subject areas
- Engineering(all)
Cite this
Exploratory Investigation of Word Embedding in Song Lyric Topic Classification : Promising Preliminary Results. / Choi, Kahyun; Downie, J Stephen.
JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc., 2018. p. 327-328 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Exploratory Investigation of Word Embedding in Song Lyric Topic Classification
T2 - Promising Preliminary Results
AU - Choi, Kahyun
AU - Downie, J Stephen
PY - 2018/5/23
Y1 - 2018/5/23
N2 - In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.
AB - In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.
KW - classification
KW - metadata
KW - song lyrics
KW - subject
KW - topic
KW - word embedding
UR - http://www.scopus.com/inward/record.url?scp=85048891564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048891564&partnerID=8YFLogxK
U2 - 10.1145/3197026.3203883
DO - 10.1145/3197026.3203883
M3 - Conference contribution
AN - SCOPUS:85048891564
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 327
EP - 328
BT - JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries
PB - Institute of Electrical and Electronics Engineers Inc.
ER -