The splog detection task and a solution based on temporal and link properties

Yu Ru Lin, Wen Yen Chen, Xiaolin Shi, Richard Sia, Xiaodan Song, Yun Chi, Koji Hino, Hari Sundaram, Jun Tatemura, Belle Tseng

Research output: Contribution to journalConference articlepeer-review

Abstract

Spam blogs (splogs) have become a major problem in the increasingly popular blogosphere. Splogs are detrimental in that they corrupt the quality of information retrieved and they waste tremendous network and storage resources. We study several research issues in splog detection. First, in comparison to web spam and email spam, we identify some unique characteristics of splog. Second, we propose a new online task that captures the unique characteristics of splog, in addition to tasks based on the traditional IR evaluation framework. The new task introduces a novel time-sensitive detection evaluation to indicate how quickly a detector can identify splogs. Third, we propose a splog detection algorithm that combines traditional content features with temporal and link regularity features that are unique to blogs. Finally, we develop an annotation tool to generate ground truth on a sampled subset of the TREC-Blog dataset. We conducted experiments on both offline (traditional splog detection) and our proposed online splog detection task. Experiments based on the annotated ground truth set show excellent results on both offline and online splog detection tasks.

Original languageEnglish (US)
JournalNIST Special Publication
StatePublished - Dec 1 2006
Externally publishedYes
Event15th Text REtrieval Conference, TREC 2006 - Gaithersburg, MD, United States
Duration: Nov 14 2006Nov 17 2006

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'The splog detection task and a solution based on temporal and link properties'. Together they form a unique fingerprint.

Cite this