TY - JOUR
T1 - On Link-based similarity join
AU - Sun, Liwen
AU - Cheng, Reynold
AU - Li, Xiang
AU - Cheung, David W.
AU - Han, Jiawei
N1 - Funding Information:
Reynold Cheng, Liwen Sun, and Xiang Li were supported by the Research Grants Council of Hong Kong (RGC Project HKU 711309E). We would like to thank the reviewers for their insightful comments.
PY - 2011/8
Y1 - 2011/8
N2 - Graphs can be found in applications like social networks, bibliographic networks, and biological databases. Understanding the relationship, or links, among graph nodes enables applications such as link prediction, recommendation, and spam detection. In this paper, we propose link-based similarity join (LS-join), which extends the similarity join operator to link-based measures. Given two sets of nodes in a graph, the LS-join returns all pairs of nodes that are highly similar to each other, with respect to an e-function. The e-function generalizes common measures like Personalized PageRank (PPR) and SimRank (SR). We study an efficient LS-join algorithm on a large graph. We further improve our solutions for PPR and SR, which involve expensive randomwalk operations. We validate our solutions by performing extensive experiments on three real graph datasets.
AB - Graphs can be found in applications like social networks, bibliographic networks, and biological databases. Understanding the relationship, or links, among graph nodes enables applications such as link prediction, recommendation, and spam detection. In this paper, we propose link-based similarity join (LS-join), which extends the similarity join operator to link-based measures. Given two sets of nodes in a graph, the LS-join returns all pairs of nodes that are highly similar to each other, with respect to an e-function. The e-function generalizes common measures like Personalized PageRank (PPR) and SimRank (SR). We study an efficient LS-join algorithm on a large graph. We further improve our solutions for PPR and SR, which involve expensive randomwalk operations. We validate our solutions by performing extensive experiments on three real graph datasets.
UR - http://www.scopus.com/inward/record.url?scp=84863753182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863753182&partnerID=8YFLogxK
U2 - 10.14778/3402707.3402712
DO - 10.14778/3402707.3402712
M3 - Conference article
AN - SCOPUS:84863753182
SN - 2150-8097
VL - 4
SP - 714
EP - 725
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
T2 - 37th International Conference on Very Large Data Bases, VLDB 2011
Y2 - 29 August 2011 through 3 September 2011
ER -