Harnessing web page directories for large-scale classification of tweets

Arkaitz Zubiaga, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Classification is paramount for an optimal processing of tweets, albeit performance of classifiers is hindered by the need of large sets of training data to encompass the diversity of con- tents one can find on Twitter. In this paper, we introduce an inexpensive way of labeling large sets of tweets, which can be easily regenerated or updated when needed. We use human-edited web page directories to infer categories from URLs contained in tweets. By experimenting with a large set of more than 5 million tweets categorized accordingly, we show that our proposed model for tweet classification can achieve 82% in accuracy, performing only 12.2% worse than for web page classification.

Original languageEnglish (US)
Title of host publicationWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web
PublisherAssociation for Computing Machinery
Pages225-226
Number of pages2
ISBN (Print)9781450320382
DOIs
StatePublished - 2013
Externally publishedYes
Event22nd International Conference on World Wide Web, WWW 2013 - Rio de Janeiro, Brazil
Duration: May 13 2013May 17 2013

Publication series

NameWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web

Other

Other22nd International Conference on World Wide Web, WWW 2013
Country/TerritoryBrazil
CityRio de Janeiro
Period5/13/135/17/13

Keywords

  • Classification
  • Distant
  • Large-scale
  • Tweets

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Harnessing web page directories for large-scale classification of tweets'. Together they form a unique fingerprint.

Cite this