Fast and flexible top-k similarity search on large networks

Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, Juanzi Li, Walter Luyten, Marie Francine Moens

Research output: Contribution to journalArticlepeer-review

Abstract

Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.

Original languageEnglish (US)
Article number13
JournalACM Transactions on Information Systems
Volume36
Issue number2
DOIs
StatePublished - Aug 2017
Externally publishedYes

Keywords

  • Heterogeneous information network
  • Random path
  • Similarity search
  • Social network
  • Vertex similarity

ASJC Scopus subject areas

  • Information Systems
  • General Business, Management and Accounting
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Fast and flexible top-k similarity search on large networks'. Together they form a unique fingerprint.

Cite this