TY - GEN
T1 - Fast computation of SimRank for static and dynamic information networks
AU - Li, Cuiping
AU - Han, Jiawei
AU - He, Guoming
AU - Jin, Xin
AU - Sun, Yizhou
AU - Yu, Yintao
AU - Wu, Tianyi
PY - 2010
Y1 - 2010
N2 - Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical "random surfer" model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many real-world scenarios, the networks change dynamically and frequently. We envision non-iterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the non-iterative computation, we propose to rewrite the SimRank equation into a non-iterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The non-iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.
AB - Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical "random surfer" model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many real-world scenarios, the networks change dynamically and frequently. We envision non-iterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the non-iterative computation, we propose to rewrite the SimRank equation into a non-iterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The non-iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.
KW - Graph
KW - Information network
KW - SimRank
KW - Similarity measure
UR - http://www.scopus.com/inward/record.url?scp=77952275931&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952275931&partnerID=8YFLogxK
U2 - 10.1145/1739041.1739098
DO - 10.1145/1739041.1739098
M3 - Conference contribution
AN - SCOPUS:77952275931
SN - 9781605589459
T3 - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
SP - 465
EP - 476
BT - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
T2 - 13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
Y2 - 22 March 2010 through 26 March 2010
ER -