Grouping game players using parallelized K-Means on supercomputers

Y. Dora Cai, Rabindra Ratan, Cuihua Shen, Jay Alameda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.

Original languageEnglish (US)
Title of host publicationProceedings of the XSEDE 2015 Conference
Subtitle of host publicationScientific Advancements Enabled by Enhanced Cyberinfrastructure
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337205
DOIs
StatePublished - Jul 26 2015
Event4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 - St. Louis, United States
Duration: Jul 26 2015Jul 30 2015

Publication series

NameACM International Conference Proceeding Series
Volume2015-July

Other

Other4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015
CountryUnited States
CitySt. Louis
Period7/26/157/30/15

Fingerprint

Supercomputers
Processing

Keywords

  • Cluster Analysis
  • K-Means
  • Parallel Processing
  • Performance Evaluation

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Cai, Y. D., Ratan, R., Shen, C., & Alameda, J. (2015). Grouping game players using parallelized K-Means on supercomputers. In Proceedings of the XSEDE 2015 Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure [a10] (ACM International Conference Proceeding Series; Vol. 2015-July). Association for Computing Machinery. https://doi.org/10.1145/2792745.2792755

Grouping game players using parallelized K-Means on supercomputers. / Cai, Y. Dora; Ratan, Rabindra; Shen, Cuihua; Alameda, Jay.

Proceedings of the XSEDE 2015 Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure. Association for Computing Machinery, 2015. a10 (ACM International Conference Proceeding Series; Vol. 2015-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cai, YD, Ratan, R, Shen, C & Alameda, J 2015, Grouping game players using parallelized K-Means on supercomputers. in Proceedings of the XSEDE 2015 Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure., a10, ACM International Conference Proceeding Series, vol. 2015-July, Association for Computing Machinery, 4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015, St. Louis, United States, 7/26/15. https://doi.org/10.1145/2792745.2792755
Cai YD, Ratan R, Shen C, Alameda J. Grouping game players using parallelized K-Means on supercomputers. In Proceedings of the XSEDE 2015 Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure. Association for Computing Machinery. 2015. a10. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2792745.2792755
Cai, Y. Dora ; Ratan, Rabindra ; Shen, Cuihua ; Alameda, Jay. / Grouping game players using parallelized K-Means on supercomputers. Proceedings of the XSEDE 2015 Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure. Association for Computing Machinery, 2015. (ACM International Conference Proceeding Series).
@inproceedings{afd9772a22ff45b48446a62d11628217,
title = "Grouping game players using parallelized K-Means on supercomputers",
abstract = "Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.",
keywords = "Cluster Analysis, K-Means, Parallel Processing, Performance Evaluation",
author = "Cai, {Y. Dora} and Rabindra Ratan and Cuihua Shen and Jay Alameda",
year = "2015",
month = "7",
day = "26",
doi = "10.1145/2792745.2792755",
language = "English (US)",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the XSEDE 2015 Conference",

}

TY - GEN

T1 - Grouping game players using parallelized K-Means on supercomputers

AU - Cai, Y. Dora

AU - Ratan, Rabindra

AU - Shen, Cuihua

AU - Alameda, Jay

PY - 2015/7/26

Y1 - 2015/7/26

N2 - Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.

AB - Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.

KW - Cluster Analysis

KW - K-Means

KW - Parallel Processing

KW - Performance Evaluation

UR - http://www.scopus.com/inward/record.url?scp=84942765562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942765562&partnerID=8YFLogxK

U2 - 10.1145/2792745.2792755

DO - 10.1145/2792745.2792755

M3 - Conference contribution

AN - SCOPUS:84942765562

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the XSEDE 2015 Conference

PB - Association for Computing Machinery

ER -