Moment-based uniform deviation bounds for κ-means and friends

Matus Jan Telgarsky, Sanjoy Dasgupta

Research output: Contribution to journalArticlepeer-review

Abstract

Suppose κ centers are fit to m points by heuristically minimizing the κ-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with p ≥ 4 bounded moments; in particular, the difference between the sample cost and distribution cost decays with m and p as mmin{-1/4,-1/2+2/p}. The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of κ-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for κ-means instances possessing some cluster structure.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
StatePublished - 2013
Externally publishedYes

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Moment-based uniform deviation bounds for κ-means and friends'. Together they form a unique fingerprint.

Cite this