TY - GEN
T1 - Centralities in large networks
T2 - 11th SIAM International Conference on Data Mining, SDM 2011
AU - Kang, U.
AU - Papadimitriou, Spiros
AU - Sun, Jimeng
AU - Tong, Hanghang
PY - 2011
Y1 - 2011
N2 - Node centrality measures are important in a large number of graph applications, from search and ranking to social and biological network analysis. In this paper we study node centrality for very large graphs, up to billions of nodes and edges. Various definitions for centrality have been proposed, ranging from very simple (e.g., node degree) to more elaborate. However, measuring centrality in billion-scale graphs poses several challenges. Many of the "traditional" definitions such as closeness and betweenness were not designed with scalability in mind. Therefore, it is very difficult, if not impossible, to compute them both accurately and efficiently. In this paper, we propose centrality measures suitable for very large graphs, as well as scalable methods to effectively compute them. More specifically, we propose effective closeness and LINERANK which are designed for billion-scale graphs. We also develop algorithms to compute the proposed centrality measures in MAPREDUCE, a modern paradigm for large-scale, distributed data processing. We present extensive experimental results on both synthetic and real datasets, which demonstrate the scalability of our approach to very large graphs, as well as interesting findings and anomalies.
AB - Node centrality measures are important in a large number of graph applications, from search and ranking to social and biological network analysis. In this paper we study node centrality for very large graphs, up to billions of nodes and edges. Various definitions for centrality have been proposed, ranging from very simple (e.g., node degree) to more elaborate. However, measuring centrality in billion-scale graphs poses several challenges. Many of the "traditional" definitions such as closeness and betweenness were not designed with scalability in mind. Therefore, it is very difficult, if not impossible, to compute them both accurately and efficiently. In this paper, we propose centrality measures suitable for very large graphs, as well as scalable methods to effectively compute them. More specifically, we propose effective closeness and LINERANK which are designed for billion-scale graphs. We also develop algorithms to compute the proposed centrality measures in MAPREDUCE, a modern paradigm for large-scale, distributed data processing. We present extensive experimental results on both synthetic and real datasets, which demonstrate the scalability of our approach to very large graphs, as well as interesting findings and anomalies.
UR - http://www.scopus.com/inward/record.url?scp=84866464719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866464719&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972818.11
DO - 10.1137/1.9781611972818.11
M3 - Conference contribution
AN - SCOPUS:84866464719
SN - 9780898719925
T3 - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
SP - 119
EP - 130
BT - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 28 April 2011 through 30 April 2011
ER -