In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in ℝd, one can compute a weighted set S ⊆ P, of size O(kε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1 + ε)-approximation. As a result, we improve the fastest known algorithms for (1 + ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1 + ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.
|Original language||English (US)|
|Number of pages||10|
|Journal||Conference Proceedings of the Annual ACM Symposium on Theory of Computing|
|State||Published - Jan 1 2004|
|Event||Proceedings of the 36th Annual ACM Symposium on Theory of Computing - Chicago, IL, United States|
Duration: Jun 13 2004 → Jun 15 2004
ASJC Scopus subject areas