On high dimensional projected clustering of data streams

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Gabrielle Dawn Allen

Research output: Contribution to journalArticlepeer-review

Abstract

The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream analysis methods have been proposed in this context. However, a lot of stream data is high-dimensional in nature. High-dimensional data is inherently more complex in clustering, classification, and similarity search. Recent research discusses methods for projected clustering over high-dimensional data sets. This method is however difficult to generalize to data streams because of the complexity of the method and the large volume of the data streams. In this paper, we propose a new, high-dimensional, projected data stream clustering method, called HPStream. The method incorporates a fading cluster structure, and the projection based clustering methodology. It is incrementally updatable and is highly scalable on both the number of dimensions and the size of the data streams, and it achieves better clustering quality in comparison with the previous stream clustering methods. Our performance study with both real and synthetic data sets demonstrates the efficiency and effectiveness of our proposed framework and implementation methods.

Original languageEnglish (US)
Pages (from-to)251-273
Number of pages23
JournalData Mining and Knowledge Discovery
Volume10
Issue number3
DOIs
StatePublished - May 1 2005

Keywords

  • Data streams
  • High dimensional
  • Projected clustering

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'On high dimensional projected clustering of data streams'. Together they form a unique fingerprint.

Cite this