Abstract
Clustering is a general term for techniques that, given a set of objects, aim to select those that are closer to one another than to the rest, according to a chosen notion of closeness. It is an unsupervised-learning problem since objects are not externally labeled by category. Much effort has been expended on finding natural mathematical definitions of closeness and then developing/evaluating algorithms in these terms. Many have argued that there is no domain-independent mathematical notion of similarity but that it is context-dependent; categories are perhaps natural in that people can evaluate them when they see them. Some have dismissed the problem of unsupervised learning in favor of supervised learning, saying it is not a powerful natural phenomenon. Yet, most learning is unsupervised. We largely learn how to think through categories by observing the world in its unlabeled state. Drawing on universal information theory, we ask whether there are universal approaches to unsupervised clustering.
Original language | English (US) |
---|---|
Title of host publication | Information-Theoretic Methods in Data Science |
Publisher | Cambridge University Press |
Pages | 263-301 |
Number of pages | 39 |
ISBN (Electronic) | 9781108616799 |
ISBN (Print) | 9781108427135 |
DOIs | |
State | Published - Jan 1 2021 |
Keywords
- clustering
- supervised learning
- universal clustering
- unsupervised learning
ASJC Scopus subject areas
- General Engineering
- General Computer Science
- General Social Sciences
- General Mathematics