On coresets for k-means and k-median clustering

Sariel Har-Peled, Soham Mazumdar

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in ℝd, one can compute a weighted set S ⊆ P, of size O(kε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1 + ε)-approximation. As a result, we improve the fastest known algorithms for (1 + ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1 + ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

Original languageEnglish (US)
Pages (from-to)291-300
Number of pages10
JournalConference Proceedings of the Annual ACM Symposium on Theory of Computing
DOIs
StatePublished - Jan 1 2004
EventProceedings of the 36th Annual ACM Symposium on Theory of Computing - Chicago, IL, United States
Duration: Jun 13 2004Jun 15 2004

Keywords

  • Clustering
  • Coreset
  • Streaming
  • k-means
  • k-median

ASJC Scopus subject areas

  • Software

Cite this