Abstract

Clustering is a general term for techniques that, given a set of objects, aim to select those that are closer to one another than to the rest, according to a chosen notion of closeness. It is an unsupervised-learning problem since objects are not externally labeled by category. Much effort has been expended on finding natural mathematical definitions of closeness and then developing/evaluating algorithms in these terms. Many have argued that there is no domain-independent mathematical notion of similarity but that it is context-dependent; categories are perhaps natural in that people can evaluate them when they see them. Some have dismissed the problem of unsupervised learning in favor of supervised learning, saying it is not a powerful natural phenomenon. Yet, most learning is unsupervised. We largely learn how to think through categories by observing the world in its unlabeled state. Drawing on universal information theory, we ask whether there are universal approaches to unsupervised clustering.

Original languageEnglish (US)
Title of host publicationInformation-Theoretic Methods in Data Science
PublisherCambridge University Press
Pages263-301
Number of pages39
ISBN (Electronic)9781108616799
ISBN (Print)9781108427135
DOIs
StatePublished - Jan 1 2021

Keywords

  • clustering
  • supervised learning
  • universal clustering
  • unsupervised learning

ASJC Scopus subject areas

  • General Engineering
  • General Computer Science
  • General Social Sciences
  • General Mathematics

Fingerprint

Dive into the research topics of 'Universal Clustering'. Together they form a unique fingerprint.

Cite this