A Framework for Clustering Evolving Data Streams

Charu C. Aggarwal, Philip S. Yu, Jiawei Han, Jianyong Wang

Research output: Chapter in Book/Report/Conference proceedingChapter


This chapter discusses a framework for clustering evolving data streams. The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream render most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem, they are generally blind to the evolution of the data and do not address the following issues: (1) the quality of the clusters is poor when the data evolves considerably over time. (2) A data stream clustering algorithm requires much greater functionality in discovering and exploring clusters over different portions of the stream. The widely used practice of viewing data stream clustering algorithms as a class of one-pass clustering algorithms is not very useful from an application point of view. The chapter discusses a fundamentally different philosophy for data stream clustering which is guided by application-centered requirements. It divides the clustering process into an online component, which periodically stores detailed summary statistics and an offline component, which uses only this summary statistics. The problems of efficient choice, storage, and use of this statistical data for a fast data stream turns out to be quite tricky. The concepts of a pyramidal time frame in conjunction with a micro-clustering approach are used.

Original languageEnglish (US)
Title of host publicationProceedings 2003 VLDB Conference
Subtitle of host publication29th International Conference on Very Large Databases (VLDB)
Number of pages12
ISBN (Electronic)9780127224428
StatePublished - Jan 1 2003
Externally publishedYes

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'A Framework for Clustering Evolving Data Streams'. Together they form a unique fingerprint.

Cite this