Data depth based clustering analysis

Myeong Hun Jeong, Yaping Cai, Clair Julia Sullivan, Shaowen Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.

Original languageEnglish (US)
Title of host publication24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
EditorsMatthias Renz, Mohamed Ali, Shawn Newsam, Matthias Renz, Siva Ravada, Goce Trajcevski
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450345897
DOIs
StatePublished - Oct 31 2016
Event24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016 - Burlingame, United States
Duration: Oct 31 2016Nov 3 2016

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

Other

Other24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
CountryUnited States
CityBurlingame
Period10/31/1611/3/16

Fingerprint

Data Depth
Clustering Analysis
Clustering algorithms
Clustering Algorithm
Parameter Selection
Robustness
Affine Invariant
Data Clustering
Centrality
Euclidean Distance
Invariance
Robustness (control systems)
Exceed
global perspective
analysis
Clustering
Unknown
Alternatives
Demonstrate
parameter

Keywords

  • Afine invariant clustering
  • Cluster analysis
  • Data depth
  • Density-based clustering analysis

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modeling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Jeong, M. H., Cai, Y., Sullivan, C. J., & Wang, S. (2016). Data depth based clustering analysis. In M. Renz, M. Ali, S. Newsam, M. Renz, S. Ravada, & G. Trajcevski (Eds.), 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016 [29] (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems). Association for Computing Machinery. https://doi.org/10.1145/2996913.2996984

Data depth based clustering analysis. / Jeong, Myeong Hun; Cai, Yaping; Sullivan, Clair Julia; Wang, Shaowen.

24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016. ed. / Matthias Renz; Mohamed Ali; Shawn Newsam; Matthias Renz; Siva Ravada; Goce Trajcevski. Association for Computing Machinery, 2016. 29 (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jeong, MH, Cai, Y, Sullivan, CJ & Wang, S 2016, Data depth based clustering analysis. in M Renz, M Ali, S Newsam, M Renz, S Ravada & G Trajcevski (eds), 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016., 29, GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, Association for Computing Machinery, 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016, Burlingame, United States, 10/31/16. https://doi.org/10.1145/2996913.2996984
Jeong MH, Cai Y, Sullivan CJ, Wang S. Data depth based clustering analysis. In Renz M, Ali M, Newsam S, Renz M, Ravada S, Trajcevski G, editors, 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016. Association for Computing Machinery. 2016. 29. (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems). https://doi.org/10.1145/2996913.2996984
Jeong, Myeong Hun ; Cai, Yaping ; Sullivan, Clair Julia ; Wang, Shaowen. / Data depth based clustering analysis. 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016. editor / Matthias Renz ; Mohamed Ali ; Shawn Newsam ; Matthias Renz ; Siva Ravada ; Goce Trajcevski. Association for Computing Machinery, 2016. (GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems).
@inproceedings{db0498b7ebd6493d9cc50f42ae82a489,
title = "Data depth based clustering analysis",
abstract = "This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.",
keywords = "Afine invariant clustering, Cluster analysis, Data depth, Density-based clustering analysis",
author = "Jeong, {Myeong Hun} and Yaping Cai and Sullivan, {Clair Julia} and Shaowen Wang",
year = "2016",
month = "10",
day = "31",
doi = "10.1145/2996913.2996984",
language = "English (US)",
series = "GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems",
publisher = "Association for Computing Machinery",
editor = "Matthias Renz and Mohamed Ali and Shawn Newsam and Matthias Renz and Siva Ravada and Goce Trajcevski",
booktitle = "24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016",

}

TY - GEN

T1 - Data depth based clustering analysis

AU - Jeong, Myeong Hun

AU - Cai, Yaping

AU - Sullivan, Clair Julia

AU - Wang, Shaowen

PY - 2016/10/31

Y1 - 2016/10/31

N2 - This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.

AB - This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.

KW - Afine invariant clustering

KW - Cluster analysis

KW - Data depth

KW - Density-based clustering analysis

UR - http://www.scopus.com/inward/record.url?scp=85011036688&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011036688&partnerID=8YFLogxK

U2 - 10.1145/2996913.2996984

DO - 10.1145/2996913.2996984

M3 - Conference contribution

AN - SCOPUS:85011036688

T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

BT - 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016

A2 - Renz, Matthias

A2 - Ali, Mohamed

A2 - Newsam, Shawn

A2 - Renz, Matthias

A2 - Ravada, Siva

A2 - Trajcevski, Goce

PB - Association for Computing Machinery

ER -