DirichletRank: Solving the zero-one gap problem of PageRank

Xuanhui Wang, Tao Tao, Jian Tao Sun, Azadeh Shakery, Chengxiang Zhai

Research output: Contribution to journalArticle

Abstract

Link-based ranking algorithms are among the most important techniques to improve web search. In particular, the PageRank algorithm has been successfully used in the Google search engine and has been attracting much attention recently. However, we find that PageRank has a zero-one gap problem which, to the best of our knowledge, has not been addressed in any previous work. This problem can be potentially exploited to spam PageRank results and make the state-of-the-art link-based antispamming techniques ineffective. The zero-one gap problem arises as a result of the current ad hoc way of computing transition probabilities in the random surfing model. We therefore propose a novel DirichletRank algorithm which calculates these probabilities using Bayesian estimation with a Dirichlet prior. DirichletRank is a variant of PageRank, but does not have the problem of zero-one gap and can be analytically shown substantially more resistant to some link spams than PageRank. Experiment results on TREC data show that DirichletRank can achieve better retrieval accuracy than PageRank due to its more reasonable allocation of transition probabilities. More importantly, experiments on the TREC dataset and another real web dataset from the Webgraph project show that, compared with the original PageRank, DirichletRank is more stable under link perturbation and is significantly more robust against both manually identified web spams and several simulated link spams. DirichletRank can be computed as efficiently as PageRank, and thus is scalable to large-scale web applications.

Original languageEnglish (US)
Article number10
JournalACM Transactions on Information Systems
Volume26
Issue number2
DOIs
StatePublished - Mar 1 2008

Keywords

  • DirichletRank
  • Link analysis
  • PageRank
  • Spamming
  • Zero-one gap

ASJC Scopus subject areas

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Fingerprint Dive into the research topics of 'DirichletRank: Solving the zero-one gap problem of PageRank'. Together they form a unique fingerprint.

  • Cite this