TY - GEN
T1 - Incremental spectral clustering with application to monitoring of evolving blog communities
AU - Ning, Huazhong
AU - Xu, Wei
AU - Chi, Yun
AU - Gong, Yihong
AU - Huang, Thomas
PY - 2007
Y1 - 2007
N2 - In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clustering algorithms such as K-means algorithm. The existing spectral clusteririg algorithms are all off-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as real time monitoring of the evolving communities of websphere or blogsphere. Unlike traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing items. This paper extends the standard spectral clustering to such evolving data by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigenvalue system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by standard spectral clustering, it achieves similar accuracy but with much lower computational cost. Close inspection into the blog content shows that the incremental approach can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs.
AB - In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clustering algorithms such as K-means algorithm. The existing spectral clusteririg algorithms are all off-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as real time monitoring of the evolving communities of websphere or blogsphere. Unlike traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing items. This paper extends the standard spectral clustering to such evolving data by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigenvalue system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by standard spectral clustering, it achieves similar accuracy but with much lower computational cost. Close inspection into the blog content shows that the incremental approach can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs.
KW - Incidence vector/matrix
KW - Incremental clustering
KW - Spectral clustering
KW - Web-blogs
UR - http://www.scopus.com/inward/record.url?scp=70449129633&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449129633&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972771.24
DO - 10.1137/1.9781611972771.24
M3 - Conference contribution
AN - SCOPUS:70449129633
SN - 9780898716306
T3 - Proceedings of the 7th SIAM International Conference on Data Mining
SP - 261
EP - 272
BT - Proceedings of the 7th SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics Publications
T2 - 7th SIAM International Conference on Data Mining
Y2 - 26 April 2007 through 28 April 2007
ER -