TY - GEN
T1 - Grouping game players using parallelized K-Means on supercomputers
AU - Cai, Y. Dora
AU - Ratan, Rabindra
AU - Shen, Cuihua
AU - Alameda, Jay
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/7/26
Y1 - 2015/7/26
N2 - Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.
AB - Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.
KW - Cluster Analysis
KW - K-Means
KW - Parallel Processing
KW - Performance Evaluation
UR - http://www.scopus.com/inward/record.url?scp=84942765562&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942765562&partnerID=8YFLogxK
U2 - 10.1145/2792745.2792755
DO - 10.1145/2792745.2792755
M3 - Conference contribution
AN - SCOPUS:84942765562
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the XSEDE 2015 Conference
PB - Association for Computing Machinery
T2 - 4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015
Y2 - 26 July 2015 through 30 July 2015
ER -