Splog detection using self-similarity analysis on blog temporal dynamics

Yu Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura, Belle L. Tseng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. There are three key ideas in our splog detection framework. We first represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts. Second, we show via a novel visualization that the blog temporal characteristics reveal attribute correlation, depending on type of the blog (normal blogs and splogs). Third, we propose the use of temporal structural properties computed from self-similarity matrices across different attributes. In a splog detector, these novel features are combined with content based features. We extract a content based feature vector from different parts of the blog - URLs, post content, etc. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM based splog detector using proposed features on real world datasets, with excellent results (90% accuracy).

Original languageEnglish (US)
Title of host publicationAIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web
Pages1-8
Number of pages8
DOIs
StatePublished - 2007
Externally publishedYes
EventAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web - Banff, AB, Canada
Duration: May 8 2007May 8 2007

Publication series

NameACM International Conference Proceeding Series
Volume215

Other

OtherAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web
Country/TerritoryCanada
CityBanff, AB
Period5/8/075/8/07

Keywords

  • Blogs
  • Regularity
  • Self-similarity
  • Spam
  • Splog detection
  • Temporal dynamics
  • Topology

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Splog detection using self-similarity analysis on blog temporal dynamics'. Together they form a unique fingerprint.

Cite this