### Abstract

Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

Original language | English (US) |
---|---|

Pages (from-to) | 145-153 |

Number of pages | 9 |

Journal | Advances in Neural Information Processing Systems |

Volume | 1 |

Issue number | January |

State | Published - Jan 1 2014 |

Event | 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada Duration: Dec 8 2014 → Dec 13 2014 |

### Fingerprint

### ASJC Scopus subject areas

- Computer Networks and Communications
- Information Systems
- Signal Processing

### Cite this

*Advances in Neural Information Processing Systems*,

*1*(January), 145-153.

**On a theory of nonparametric pairwise similarity for clustering : Connecting clustering to classification.** / Yang, Yingzhen; Liang, Feng; Yan, Shuicheng; Wang, Zhangyang; Huang, Thomas S.

Research output: Contribution to journal › Conference article

*Advances in Neural Information Processing Systems*, vol. 1, no. January, pp. 145-153.

}

TY - JOUR

T1 - On a theory of nonparametric pairwise similarity for clustering

T2 - Connecting clustering to classification

AU - Yang, Yingzhen

AU - Liang, Feng

AU - Yan, Shuicheng

AU - Wang, Zhangyang

AU - Huang, Thomas S

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

AB - Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plugin classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

UR - http://www.scopus.com/inward/record.url?scp=84937926318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937926318&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84937926318

VL - 1

SP - 145

EP - 153

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

IS - January

ER -