TY - CHAP
T1 - Mining large information networks by graph summarization
AU - Chen, Chen
AU - Lin, Cindy Xide
AU - Fredrikson, Matt
AU - Christodorescu, Mihai
AU - Yan, Xifeng
AU - Han, Jiawei
N1 - Publisher Copyright:
© Springer Science+Business Media, LLC 2010. All rights reserved.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Graphs are prevalent in many domains such as bioinformatics, social networks, Web, and cybersecurity. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering, and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs. The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.
AB - Graphs are prevalent in many domains such as bioinformatics, social networks, Web, and cybersecurity. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering, and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs. The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.
UR - http://www.scopus.com/inward/record.url?scp=84919835680&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919835680&partnerID=8YFLogxK
U2 - 10.1007/978-1-4419-6515-8_18
DO - 10.1007/978-1-4419-6515-8_18
M3 - Chapter
AN - SCOPUS:84919835680
SN - 9781441965141
VL - 9781441965158
SP - 475
EP - 501
BT - Link Mining
PB - Springer
ER -