TY - GEN
T1 - Understanding graph sampling algorithms for social network analysis
AU - Wang, Tianyi
AU - Chen, Yang
AU - Zhang, Zengbin
AU - Xu, Tianyin
AU - Jin, Long
AU - Hui, Pan
AU - Deng, Beixing
AU - Li, Xing
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - Being able to keep the graph scale small while capturing the properties of the original social graph, graph sampling provides an efficient, yet inexpensive solution for social network analysis. The challenge is how to create a small, but representative sample out of the massive social graph with millions or even billions of nodes. Several sampling algorithms have been proposed in previous studies, but there lacks fair evaluation and comparison among them. In this paper, we analyze the state-of art graph sampling algorithms and evaluate their performance on some widely recognized graph properties on directed graphs using large-scale social network datasets. We evaluate not only the commonly used node degree distribution, but also clustering coefficient, which quantifies how well connected are the neighbors of a node in a graph. Through the comparison we have found that none of the algorithms is able to obtain satisfied sampling results in both of these properties, and the performance of each algorithm differs much in different kinds of datasets.
AB - Being able to keep the graph scale small while capturing the properties of the original social graph, graph sampling provides an efficient, yet inexpensive solution for social network analysis. The challenge is how to create a small, but representative sample out of the massive social graph with millions or even billions of nodes. Several sampling algorithms have been proposed in previous studies, but there lacks fair evaluation and comparison among them. In this paper, we analyze the state-of art graph sampling algorithms and evaluate their performance on some widely recognized graph properties on directed graphs using large-scale social network datasets. We evaluate not only the commonly used node degree distribution, but also clustering coefficient, which quantifies how well connected are the neighbors of a node in a graph. Through the comparison we have found that none of the algorithms is able to obtain satisfied sampling results in both of these properties, and the performance of each algorithm differs much in different kinds of datasets.
UR - http://www.scopus.com/inward/record.url?scp=80052424453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052424453&partnerID=8YFLogxK
U2 - 10.1109/ICDCSW.2011.34
DO - 10.1109/ICDCSW.2011.34
M3 - Conference contribution
AN - SCOPUS:80052424453
SN - 9780769543864
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 123
EP - 128
BT - Proceedings - 31st International Conference on Distributed Computing Systems Workshops, ICDCSW 2011
T2 - 31st International Conference on Distributed Computing Systems Workshops, ICDCSW 2011
Y2 - 20 June 2011 through 24 June 2011
ER -