TY - GEN
T1 - Unsupervised sparse vector densification for short text similarity
AU - Song, Yangqiu
AU - Roth, Dan
N1 - Publisher Copyright:
© 2015 Association for Computational Linguistics.
PY - 2015
Y1 - 2015
N2 - Sparse representations of text such as bag-ofwords models or extended explicit semantic analysis (ESA) representations are commonly used in many NLP applications. However, for short texts, the similarity between two such sparse vectors is not accurate due to the small term overlap. While there have been multiple proposals for dense representations of words, measuring similarity between short texts (sentences, snippets, paragraphs) requires combining these token level similarities. In this paper, we propose to combine ESA representations and word2vec representations as a way to generate denser representations and, consequently, a better similarity measure between short texts. We study three densification mechanisms that involve aligning sparse representation via many-to-many, many-to-one, and oneto-one mappings. We then show the effectiveness of these mechanisms on measuring similarity between short texts.
AB - Sparse representations of text such as bag-ofwords models or extended explicit semantic analysis (ESA) representations are commonly used in many NLP applications. However, for short texts, the similarity between two such sparse vectors is not accurate due to the small term overlap. While there have been multiple proposals for dense representations of words, measuring similarity between short texts (sentences, snippets, paragraphs) requires combining these token level similarities. In this paper, we propose to combine ESA representations and word2vec representations as a way to generate denser representations and, consequently, a better similarity measure between short texts. We study three densification mechanisms that involve aligning sparse representation via many-to-many, many-to-one, and oneto-one mappings. We then show the effectiveness of these mechanisms on measuring similarity between short texts.
UR - http://www.scopus.com/inward/record.url?scp=84959902045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959902045&partnerID=8YFLogxK
U2 - 10.3115/v1/n15-1138
DO - 10.3115/v1/n15-1138
M3 - Conference contribution
AN - SCOPUS:84959902045
T3 - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 1275
EP - 1280
BT - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Y2 - 31 May 2015 through 5 June 2015
ER -