TY - GEN
T1 - Delta-simrank computing on mapreduce
AU - Cao, Liangliang
AU - Kim, Hyun Duk
AU - Tsai, Min Hsuan
AU - Cho, Brian
AU - Li, Zhen
AU - Gupta, Indranil
PY - 2012
Y1 - 2012
N2 - Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.
AB - Based on the intuition that "two objects are similar if they are related to similar objects", SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google's MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed Sim- Rank algorithm, and leads to up to 30 times speed-up in the best case1.
KW - Delta-simrank
KW - Distributed computing
KW - Simrank
UR - http://www.scopus.com/inward/record.url?scp=84866617119&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866617119&partnerID=8YFLogxK
U2 - 10.1145/2351316.2351321
DO - 10.1145/2351316.2351321
M3 - Conference contribution
AN - SCOPUS:84866617119
SN - 9781450315470
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 28
EP - 35
BT - Proceedings of 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining
T2 - 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine-12 - Held in Conjunction with SIGKDD Conference
Y2 - 12 August 2012 through 12 August 2012
ER -