Harnessing web page directories for large-scale classification of tweets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Classification is paramount for an optimal processing of tweets, albeit performance of classifiers is hindered by the need of large sets of training data to encompass the diversity of con- tents one can find on Twitter. In this paper, we introduce an inexpensive way of labeling large sets of tweets, which can be easily regenerated or updated when needed. We use human-edited web page directories to infer categories from URLs contained in tweets. By experimenting with a large set of more than 5 million tweets categorized accordingly, we show that our proposed model for tweet classification can achieve 82% in accuracy, performing only 12.2% worse than for web page classification.

Original languageEnglish (US)
Title of host publicationWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web
PublisherAssociation for Computing Machinery
Pages225-226
Number of pages2
ISBN (Print)9781450320382
DOIs
StatePublished - 2013
Externally publishedYes
EventWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web - Rio de Janeiro, Brazil
Duration: May 13 2013May 17 2013

Publication series

NameWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web

Conference

ConferenceWWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web
Country/TerritoryBrazil
CityRio de Janeiro
Period5/13/135/17/13

Keywords

  • Classification
  • Distant
  • Large-scale
  • Tweets

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Harnessing web page directories for large-scale classification of tweets'. Together they form a unique fingerprint.

Cite this