Topic initiator detection on the world wide web

Xin Jin, Scott Spangler, Rui Ma, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we introduce a new Web mining and search technique - Topic Initiator Detection (TID) on the Web. Given a topic query on the Internet and the resulting collection of time-stamped web documents which contain the query keywords, the task of TID is to automatically return which web document (or its author) initiated the topic or was the first to discuss about the topic. To deal with the TID problem, we design a system framework and propose algorithm InitRank (Initiator Ranking) to rank the web documents by their possibility to be the topic initiator. We first extract features from the web documents and design several topic initiator indicators. Then, we propose a TCL graph which integrates the Time, Content and Link information and design an optimization framework over the graph to compute InitRank. Experiments show that compared with baseline methods, such as direct time sorting, well-known link based ranking algorithms PageRank and HITS, InitRank achieves the best overall performance with high effectiveness and robustness. In case studies, we successfully detected (1) the first web document related to a famous rumor of an Australia product banned in USA and (2) the pre-release of IBM and Google Cloud Computing collaboration before the official announcement.

Original languageEnglish (US)
Title of host publicationProceedings of the 19th International Conference on World Wide Web, WWW '10
Pages481-490
Number of pages10
DOIs
StatePublished - 2010
Event19th International World Wide Web Conference, WWW2010 - Raleigh, NC, United States
Duration: Apr 26 2010Apr 30 2010

Publication series

NameProceedings of the 19th International Conference on World Wide Web, WWW '10

Other

Other19th International World Wide Web Conference, WWW2010
Country/TerritoryUnited States
CityRaleigh, NC
Period4/26/104/30/10

Keywords

  • information retrieval
  • ranking
  • topic initiator
  • web mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Topic initiator detection on the world wide web'. Together they form a unique fingerprint.

Cite this