gIceberg: Towards iceberg analysis in large graphs

Nan Li, Ziyu Guan, Lijie Ren, Jian Wu, Jiawei Han, Xifeng Yan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditional multi-dimensional data analysis techniques such as iceberg cube cannot be directly applied to graphs for finding interesting or anomalous vertices due to the lack of dimensionality in graphs. In this paper, we introduce the concept of graph icebergs that refer to vertices for which the concentration (aggregation) of an attribute in their vicinities is abnormally high. Intuitively, these vertices shall be "close" to the attribute of interest in the graph space. Based on this intuition, we propose a novel framework, called gIceberg, which performs aggregation using random walks, rather than traditional SUM and AVG aggregate functions. This proposed framework scores vertices by their different levels of interestingness and finds important vertices that meet a user-specified threshold. To improve scalability, two aggregation strategies, forward and backward aggregation, are proposed with corresponding optimization techniques and bounds. Experiments on both real-world and synthetic large graphs demonstrate that gIceberg is effective and scalable.

Original languageEnglish (US)
Title of host publicationICDE 2013 - 29th International Conference on Data Engineering
Pages1021-1032
Number of pages12
DOIs
StatePublished - 2013
Event29th International Conference on Data Engineering, ICDE 2013 - Brisbane, QLD, Australia
Duration: Apr 8 2013Apr 11 2013

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other29th International Conference on Data Engineering, ICDE 2013
Country/TerritoryAustralia
CityBrisbane, QLD
Period4/8/134/11/13

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'gIceberg: Towards iceberg analysis in large graphs'. Together they form a unique fingerprint.

Cite this