TY - GEN
T1 - Progressive clustering of networks using structure-connected order of traversal
AU - Bortner, Dustin
AU - Han, Jiawei
PY - 2010
Y1 - 2010
N2 - Network clustering enables us to view a complex network at the macro level, by grouping its nodes into units whose characteristics and interrelationships are easier to analyze and understand. State-of-the-art network partitioning methods are unable to identify hubs and outliers. A recently proposed algorithm, SCAN, overcomes this difficulty. However, it requires a minimum similarity parameter ε but provides no automated way to find it. Thus, it must be rerun for each ε value and does not capture the variety or hierarchy of clusters. We propose a new algorithm, SCOT (or Structure-Connected Order of Traversal), that produces a length n sequence containing all possible ε-clusterings. We propose a new algorithm, HintClus (or Hierarchy-Induced Network Clustering), to hierarchically cluster the network by finding only best cluster boundaries (not agglomerative). Results on model-based synthetic network data and real data show that SCOT's execution time is comparable to SCAN, that HintClus runs in negligible time, and that HintClus produces sensible clusters in the presence of noise.
AB - Network clustering enables us to view a complex network at the macro level, by grouping its nodes into units whose characteristics and interrelationships are easier to analyze and understand. State-of-the-art network partitioning methods are unable to identify hubs and outliers. A recently proposed algorithm, SCAN, overcomes this difficulty. However, it requires a minimum similarity parameter ε but provides no automated way to find it. Thus, it must be rerun for each ε value and does not capture the variety or hierarchy of clusters. We propose a new algorithm, SCOT (or Structure-Connected Order of Traversal), that produces a length n sequence containing all possible ε-clusterings. We propose a new algorithm, HintClus (or Hierarchy-Induced Network Clustering), to hierarchically cluster the network by finding only best cluster boundaries (not agglomerative). Results on model-based synthetic network data and real data show that SCOT's execution time is comparable to SCAN, that HintClus runs in negligible time, and that HintClus produces sensible clusters in the presence of noise.
UR - http://www.scopus.com/inward/record.url?scp=77952773506&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952773506&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2010.5447895
DO - 10.1109/ICDE.2010.5447895
M3 - Conference contribution
AN - SCOPUS:77952773506
SN - 9781424454440
T3 - Proceedings - International Conference on Data Engineering
SP - 653
EP - 656
BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010
Y2 - 1 March 2010 through 6 March 2010
ER -