Linking Tweets to News: A framework to enrich short text data in social media

Weiwei Guo, Hao Li, Heng Ji, Mona Diab

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.

Original languageEnglish (US)
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages239-249
Number of pages11
ISBN (Print)9781937284503
StatePublished - 2013
Externally publishedYes
Event51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: Aug 4 2013Aug 9 2013

Publication series

NameACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Volume1

Other

Other51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
Country/TerritoryBulgaria
CitySofia
Period8/4/138/9/13

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Linking Tweets to News: A framework to enrich short text data in social media'. Together they form a unique fingerprint.

Cite this