TY - GEN
T1 - Linking Tweets to News
T2 - 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
AU - Guo, Weiwei
AU - Li, Hao
AU - Ji, Heng
AU - Diab, Mona
PY - 2013
Y1 - 2013
N2 - Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
AB - Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
UR - http://www.scopus.com/inward/record.url?scp=84907016550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907016550&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84907016550
SN - 9781937284503
T3 - ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 239
EP - 249
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
Y2 - 4 August 2013 through 9 August 2013
ER -