Abstract
This chapter presents a new framework, HPStream, for high-dimensional projected clustering of data streams. It finds projected clusters in particular subsets of the dimensions by maintaining condensed representations of the clusters over time. The algorithm provides better quality clusters than full dimensional data stream clustering algorithms. The chapter analyzes the algorithm on a number of real and synthetic data sets. In each case, it is found that the HPStream algorithm is more effective than the full dimensional CluStream algorithm. High-dimensional projected clustering of data streams opens a new direction for exploration of stream data mining. With this methodology, one can treat projected clustering as a preprocessing step that may promote more effective methods for stream classification, similarity, evolution, and outlier analysis.
Original language | English (US) |
---|---|
Title of host publication | Proceedings 2004 VLDB Conference |
Subtitle of host publication | The 30th International Conference on Very Large Databases (VLDB) |
Publisher | Elsevier |
Pages | 852-863 |
Number of pages | 12 |
ISBN (Electronic) | 9780120884698 |
DOIs | |
State | Published - Jan 1 2004 |
Externally published | Yes |
ASJC Scopus subject areas
- General Computer Science