Emergent community agglomeration from data set geometry

Chenchao Zhao, Jun S. Song

Research output: Contribution to journalArticle

Abstract

In the statistical learning language, samples are snapshots of random vectors drawn from some unknown distribution. Such vectors usually reside in a high-dimensional Euclidean space, and thus the "curse of dimensionality" often undermines the power of learning methods, including community detection and clustering algorithms, that rely on Euclidean geometry. This paper presents the idea of effective dissimilarity transformation (EDT) on empirical dissimilarity hyperspheres and studies its effects using synthetic and gene expression data sets. Iterating the EDT turns a static data distribution into a dynamical process purely driven by the empirical data set geometry and adaptively ameliorates the curse of dimensionality, partly through changing the topology of a Euclidean feature space Rn into a compact hypersphere Sn. The EDT often improves the performance of hierarchical clustering via the automatic grouping information emerging from global interactions of data points. The EDT is not restricted to hierarchical clustering, and other learning methods based on pairwise dissimilarity should also benefit from the many desirable properties of EDT.

Original languageEnglish (US)
Article number042307
JournalPhysical Review E
Volume95
Issue number4
DOIs
StatePublished - Apr 13 2017

ASJC Scopus subject areas

  • Statistical and Nonlinear Physics
  • Statistics and Probability
  • Condensed Matter Physics

Fingerprint Dive into the research topics of 'Emergent community agglomeration from data set geometry'. Together they form a unique fingerprint.

  • Cite this