Delta-simrank computing on mapreduce

Liangliang Cao, Hyun Duk Kim, Min Hsuan Tsai, Brian Cho, Zhen Li, Indranil Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.

Original languageEnglish (US)
Title of host publicationProceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining
Subtitle of host publicationAlgorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference
Pages28-35
Number of pages8
DOIs
StatePublished - Sep 28 2012
Event1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference - Beijing, China
Duration: Aug 12 2012Aug 12 2012

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference
CountryChina
CityBeijing
Period8/12/128/12/12

Fingerprint

Distributed computer systems
Computational complexity
Costs

Keywords

  • Delta-simrank
  • Distributed computing
  • Simrank

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Cao, L., Kim, H. D., Tsai, M. H., Cho, B., Li, Z., & Gupta, I. (2012). Delta-simrank computing on mapreduce. In Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference (pp. 28-35). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2351316.2351321

Delta-simrank computing on mapreduce. / Cao, Liangliang; Kim, Hyun Duk; Tsai, Min Hsuan; Cho, Brian; Li, Zhen; Gupta, Indranil.

Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference. 2012. p. 28-35 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cao, L, Kim, HD, Tsai, MH, Cho, B, Li, Z & Gupta, I 2012, Delta-simrank computing on mapreduce. in Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 28-35, 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference, Beijing, China, 8/12/12. https://doi.org/10.1145/2351316.2351321
Cao L, Kim HD, Tsai MH, Cho B, Li Z, Gupta I. Delta-simrank computing on mapreduce. In Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference. 2012. p. 28-35. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2351316.2351321
Cao, Liangliang ; Kim, Hyun Duk ; Tsai, Min Hsuan ; Cho, Brian ; Li, Zhen ; Gupta, Indranil. / Delta-simrank computing on mapreduce. Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference. 2012. pp. 28-35 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{2dd87632cfe349fe87c1773da602cc67,
title = "Delta-simrank computing on mapreduce",
abstract = "Based on the intuition that {"}two objects are similar if they are related to similar objects{"}, SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.",
keywords = "Delta-simrank, Distributed computing, Simrank",
author = "Liangliang Cao and Kim, {Hyun Duk} and Tsai, {Min Hsuan} and Brian Cho and Zhen Li and Indranil Gupta",
year = "2012",
month = "9",
day = "28",
doi = "10.1145/2351316.2351321",
language = "English (US)",
isbn = "9781450315470",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "28--35",
booktitle = "Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining",

}

TY - GEN

T1 - Delta-simrank computing on mapreduce

AU - Cao, Liangliang

AU - Kim, Hyun Duk

AU - Tsai, Min Hsuan

AU - Cho, Brian

AU - Li, Zhen

AU - Gupta, Indranil

PY - 2012/9/28

Y1 - 2012/9/28

N2 - Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.

AB - Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.

KW - Delta-simrank

KW - Distributed computing

KW - Simrank

UR - http://www.scopus.com/inward/record.url?scp=84866617119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866617119&partnerID=8YFLogxK

U2 - 10.1145/2351316.2351321

DO - 10.1145/2351316.2351321

M3 - Conference contribution

AN - SCOPUS:84866617119

SN - 9781450315470

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 28

EP - 35

BT - Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining

ER -