Grouping game players using parallelized K-Means on supercomputers

Y. Dora Cai, Rabindra Ratan, Cuihua Shen, Jay Alameda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- To eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.

Original languageEnglish (US)
Title of host publicationProceedings of the XSEDE 2015 Conference
Subtitle of host publicationScientific Advancements Enabled by Enhanced Cyberinfrastructure
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450337205
DOIs
StatePublished - Jul 26 2015
Event4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015 - St. Louis, United States
Duration: Jul 26 2015Jul 30 2015

Publication series

NameACM International Conference Proceeding Series
Volume2015-July

Other

Other4th Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2015
Country/TerritoryUnited States
CitySt. Louis
Period7/26/157/30/15

Keywords

  • Cluster Analysis
  • K-Means
  • Parallel Processing
  • Performance Evaluation

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Grouping game players using parallelized K-Means on supercomputers'. Together they form a unique fingerprint.

Cite this