TY - GEN
T1 - Efficient processing of streaming graphs for evolution-aware clustering
AU - Yuan, Mindi
AU - Wu, Kun Lung
AU - Jacques-Silva, Gabriela
AU - Lu, Yi
PY - 2013
Y1 - 2013
N2 - The clustering of vertices often evolves with time in a streaming graph, where graph update events are given as a stream of edge (vertex) insertions and deletions. Although a sliding window in stream processing naturally captures some cluster evolution, it alone may not be adequate, especially if the window size is large and the clustering within the windowed stream is unstable. Prior graph clustering approaches are mostly insensitive to clustering evolution. In this paper, we present an efficient approach to processing streaming graphs for evolution-aware clustering (EAC) of vertices. We incrementally manage individual connected components as clusters subject to a constraint on the maximal cluster size. For each cluster, we keep the relative recency of edges in a sorted order and favor more recent edges in clustering. We evaluate the effectiveness of EAC and compare it with a previous state-of-the-art evolution-insensitive clustering (EIC) approach. The results show that EAC is both effective and efficient in capturing evolution in a streaming graph. Moreover, we implement EAC as a streaming graph operator on IBM's InfoSphere Streams, a large-scale distributed middleware for stream processing, and show snapshots of the user cluster evolution in a streaming Twitter mention graph.
AB - The clustering of vertices often evolves with time in a streaming graph, where graph update events are given as a stream of edge (vertex) insertions and deletions. Although a sliding window in stream processing naturally captures some cluster evolution, it alone may not be adequate, especially if the window size is large and the clustering within the windowed stream is unstable. Prior graph clustering approaches are mostly insensitive to clustering evolution. In this paper, we present an efficient approach to processing streaming graphs for evolution-aware clustering (EAC) of vertices. We incrementally manage individual connected components as clusters subject to a constraint on the maximal cluster size. For each cluster, we keep the relative recency of edges in a sorted order and favor more recent edges in clustering. We evaluate the effectiveness of EAC and compare it with a previous state-of-the-art evolution-insensitive clustering (EIC) approach. The results show that EAC is both effective and efficient in capturing evolution in a streaming graph. Moreover, we implement EAC as a streaming graph operator on IBM's InfoSphere Streams, a large-scale distributed middleware for stream processing, and show snapshots of the user cluster evolution in a streaming Twitter mention graph.
KW - Clustering streaming graphs
KW - Evolution-aware clustering
UR - http://www.scopus.com/inward/record.url?scp=84889584979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889584979&partnerID=8YFLogxK
U2 - 10.1145/2505515.2505750
DO - 10.1145/2505515.2505750
M3 - Conference contribution
AN - SCOPUS:84889584979
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 319
EP - 328
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Y2 - 27 October 2013 through 1 November 2013
ER -