Delta-simrank computing on mapreduce

Liangliang Cao, Hyun Duk Kim, Min Hsuan Tsai, Brian Cho, Zhen Li, Indranil Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.

Original languageEnglish (US)
Title of host publicationProceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining
Subtitle of host publicationAlgorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference
Pages28-35
Number of pages8
DOIs
StatePublished - Sep 28 2012
Event1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference - Beijing, China
Duration: Aug 12 2012Aug 12 2012

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference
CountryChina
CityBeijing
Period8/12/128/12/12

    Fingerprint

Keywords

  • Delta-simrank
  • Distributed computing
  • Simrank

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Cao, L., Kim, H. D., Tsai, M. H., Cho, B., Li, Z., & Gupta, I. (2012). Delta-simrank computing on mapreduce. In Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference (pp. 28-35). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2351316.2351321