TY - GEN
T1 - Top-K aggregation queries over large networks
AU - Yan, Xifeng
AU - He, Bin
AU - Zhu, Feida
AU - Han, Jiawei
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Searching and mining large graphs today is critical to a variety of application domains, ranging from personalized recommendation in social networks, to searches for functional associations in biological pathways. In these domains, there is a need to perform aggregation operations on large-scale networks. Unfortunately the existing implementation of aggregation operations on relational databases does not guarantee superior performance in network space, especially when it involves edge traversals and joins of gigantic tables. In this paper, we investigate the neighborhood aggregation queries: Find nodes that have top-k highest aggregate values over their h-hop neighbors. While these basic queries are common in a wide range of search and recommendation tasks, surprisingly they have not been studied systematically. We developed a Local Neighborhood Aggregation framework, called LONA, to answer them efficiently. LONA exploits two properties unique in network space: First, the aggregate value for the neighboring nodes should be similar in most cases; Second, given the distribution of attribute values, it is possible to estimate the upper-bound value of aggregates. These two properties inspire the development of novel pruning techniques, forward pruning using differential index and backward pruning using partial distribution. Empirical results show that LONA could outperform the baseline algorithm up to 10 times in real-life large networks.
AB - Searching and mining large graphs today is critical to a variety of application domains, ranging from personalized recommendation in social networks, to searches for functional associations in biological pathways. In these domains, there is a need to perform aggregation operations on large-scale networks. Unfortunately the existing implementation of aggregation operations on relational databases does not guarantee superior performance in network space, especially when it involves edge traversals and joins of gigantic tables. In this paper, we investigate the neighborhood aggregation queries: Find nodes that have top-k highest aggregate values over their h-hop neighbors. While these basic queries are common in a wide range of search and recommendation tasks, surprisingly they have not been studied systematically. We developed a Local Neighborhood Aggregation framework, called LONA, to answer them efficiently. LONA exploits two properties unique in network space: First, the aggregate value for the neighboring nodes should be similar in most cases; Second, given the distribution of attribute values, it is possible to estimate the upper-bound value of aggregates. These two properties inspire the development of novel pruning techniques, forward pruning using differential index and backward pruning using partial distribution. Empirical results show that LONA could outperform the baseline algorithm up to 10 times in real-life large networks.
UR - http://www.scopus.com/inward/record.url?scp=77952765716&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952765716&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2010.5447863
DO - 10.1109/ICDE.2010.5447863
M3 - Conference contribution
AN - SCOPUS:77952765716
SN - 9781424454440
T3 - Proceedings - International Conference on Data Engineering
SP - 377
EP - 380
BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010
Y2 - 1 March 2010 through 6 March 2010
ER -