TY - GEN
T1 - Data depth based clustering analysis
AU - Jeong, Myeong Hun
AU - Cai, Yaping
AU - Sullivan, Clair J.
AU - Wang, Shaowen
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.
AB - This paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with diffierent parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also signifficantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, the proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are afine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of afine invariance, and exceeds or matches the robustness to noises of DBSCAN or HDBSCAN. The robustness to parameter selection is also demonstrated through the case study of clustering twitter data.
KW - Afine invariant clustering
KW - Cluster analysis
KW - Data depth
KW - Density-based clustering analysis
UR - http://www.scopus.com/inward/record.url?scp=85011036688&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85011036688&partnerID=8YFLogxK
U2 - 10.1145/2996913.2996984
DO - 10.1145/2996913.2996984
M3 - Conference contribution
AN - SCOPUS:85011036688
T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
BT - 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
A2 - Renz, Matthias
A2 - Ali, Mohamed
A2 - Newsam, Shawn
A2 - Renz, Matthias
A2 - Ravada, Siva
A2 - Trajcevski, Goce
PB - Association for Computing Machinery
T2 - 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2016
Y2 - 31 October 2016 through 3 November 2016
ER -