TY - GEN
T1 - gIceberg
T2 - 29th International Conference on Data Engineering, ICDE 2013
AU - Li, Nan
AU - Guan, Ziyu
AU - Ren, Lijie
AU - Wu, Jian
AU - Han, Jiawei
AU - Yan, Xifeng
PY - 2013
Y1 - 2013
N2 - Traditional multi-dimensional data analysis techniques such as iceberg cube cannot be directly applied to graphs for finding interesting or anomalous vertices due to the lack of dimensionality in graphs. In this paper, we introduce the concept of graph icebergs that refer to vertices for which the concentration (aggregation) of an attribute in their vicinities is abnormally high. Intuitively, these vertices shall be "close" to the attribute of interest in the graph space. Based on this intuition, we propose a novel framework, called gIceberg, which performs aggregation using random walks, rather than traditional SUM and AVG aggregate functions. This proposed framework scores vertices by their different levels of interestingness and finds important vertices that meet a user-specified threshold. To improve scalability, two aggregation strategies, forward and backward aggregation, are proposed with corresponding optimization techniques and bounds. Experiments on both real-world and synthetic large graphs demonstrate that gIceberg is effective and scalable.
AB - Traditional multi-dimensional data analysis techniques such as iceberg cube cannot be directly applied to graphs for finding interesting or anomalous vertices due to the lack of dimensionality in graphs. In this paper, we introduce the concept of graph icebergs that refer to vertices for which the concentration (aggregation) of an attribute in their vicinities is abnormally high. Intuitively, these vertices shall be "close" to the attribute of interest in the graph space. Based on this intuition, we propose a novel framework, called gIceberg, which performs aggregation using random walks, rather than traditional SUM and AVG aggregate functions. This proposed framework scores vertices by their different levels of interestingness and finds important vertices that meet a user-specified threshold. To improve scalability, two aggregation strategies, forward and backward aggregation, are proposed with corresponding optimization techniques and bounds. Experiments on both real-world and synthetic large graphs demonstrate that gIceberg is effective and scalable.
UR - http://www.scopus.com/inward/record.url?scp=84881326235&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881326235&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2013.6544894
DO - 10.1109/ICDE.2013.6544894
M3 - Conference contribution
AN - SCOPUS:84881326235
SN - 9781467349086
T3 - Proceedings - International Conference on Data Engineering
SP - 1021
EP - 1032
BT - ICDE 2013 - 29th International Conference on Data Engineering
Y2 - 8 April 2013 through 11 April 2013
ER -