TY - JOUR
T1 - Mining top-K large structural patterns in a massive network
AU - Zhu, Feida
AU - Qu, Qiang
AU - Lo, David
AU - Yan, Xifeng
AU - Han, Jiawei
AU - Yu, Philip S.
N1 - Funding Information:
The work was supported in part by U.S. National Science Foundation grants IIS-0905215 and IIS-1017362, and by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF-09-2-0053 (NS-CTA).
PY - 2011/8
Y1 - 2011/8
N2 - With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1-ε. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call "spiders". With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods.
AB - With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1-ε. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call "spiders". With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods.
UR - http://www.scopus.com/inward/record.url?scp=84863730213&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863730213&partnerID=8YFLogxK
U2 - 10.14778/3402707.3402720
DO - 10.14778/3402707.3402720
M3 - Conference article
AN - SCOPUS:84863730213
SN - 2150-8097
VL - 4
SP - 807
EP - 818
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
T2 - 37th International Conference on Very Large Data Bases, VLDB 2011
Y2 - 29 August 2011 through 3 September 2011
ER -