## Abstract

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in ℝ^{d}, one can compute a weighted set S ⊆ P, of size O(kε^{-d} log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1 + ε)-approximation. As a result, we improve the fastest known algorithms for (1 + ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1 + ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

Original language | English (US) |
---|---|

Pages (from-to) | 291-300 |

Number of pages | 10 |

Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |

DOIs | |

State | Published - 2004 |

Event | Proceedings of the 36th Annual ACM Symposium on Theory of Computing - Chicago, IL, United States Duration: Jun 13 2004 → Jun 15 2004 |

## Keywords

- Clustering
- Coreset
- Streaming
- k-means
- k-median

## ASJC Scopus subject areas

- Software