Abstract
In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in ℝd, one can compute a weighted set S ⊆ P, of size O(kε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1 + ε)-approximation. As a result, we improve the fastest known algorithms for (1 + ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1 + ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.
Original language | English (US) |
---|---|
Pages (from-to) | 291-300 |
Number of pages | 10 |
Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |
DOIs | |
State | Published - 2004 |
Event | Proceedings of the 36th Annual ACM Symposium on Theory of Computing - Chicago, IL, United States Duration: Jun 13 2004 → Jun 15 2004 |
Keywords
- Clustering
- Coreset
- Streaming
- k-means
- k-median
ASJC Scopus subject areas
- Software