On a theory of nonparametric pairwise similarity for clustering: Connecting clustering to classification

Yingzhen Yang, Feng Liang, Shuicheng Yan, Zhangyang Wang, Thomas S Huang

Research output: Contribution to journalConference article

Abstract

Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

Original languageEnglish (US)
Pages (from-to)145-153
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume1
Issue numberJanuary
StatePublished - Jan 1 2014
Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
Duration: Dec 8 2014Dec 13 2014

Fingerprint

Classifiers
Supervised learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

On a theory of nonparametric pairwise similarity for clustering : Connecting clustering to classification. / Yang, Yingzhen; Liang, Feng; Yan, Shuicheng; Wang, Zhangyang; Huang, Thomas S.

In: Advances in Neural Information Processing Systems, Vol. 1, No. January, 01.01.2014, p. 145-153.

Research output: Contribution to journalConference article

@article{01b0648f6a6a4c869bf4219a8ab3c745,
title = "On a theory of nonparametric pairwise similarity for clustering: Connecting clustering to classification",
abstract = "Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.",
author = "Yingzhen Yang and Feng Liang and Shuicheng Yan and Zhangyang Wang and Huang, {Thomas S}",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
volume = "1",
pages = "145--153",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",
number = "January",

}

TY - JOUR

T1 - On a theory of nonparametric pairwise similarity for clustering

T2 - Connecting clustering to classification

AU - Yang, Yingzhen

AU - Liang, Feng

AU - Yan, Shuicheng

AU - Wang, Zhangyang

AU - Huang, Thomas S

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

AB - Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

UR - http://www.scopus.com/inward/record.url?scp=84937926318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937926318&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84937926318

VL - 1

SP - 145

EP - 153

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

IS - January

ER -