Abstract

This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.

Original languageEnglish (US)
Title of host publication24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
EditorsMatthias Renz, Mohamed Ali, Shawn Newsam, Matthias Renz, Siva Ravada, Goce Trajcevski
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450345897
DOIs
StatePublished - Oct 31 2016
Event24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016 - Burlingame, United States
Duration: Oct 31 2016Nov 3 2016

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

Other

Other24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
Country/TerritoryUnited States
CityBurlingame
Period10/31/1611/3/16

Keywords

  • Afine invariant clustering
  • Cluster analysis
  • Data depth
  • Density-based clustering analysis

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modeling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'Data depth based clustering analysis'. Together they form a unique fingerprint.

Cite this