TY - GEN
T1 - A parallel implementation of K-means clustering on GPUs
AU - Farivar, Reza
AU - Rebolledo, Daniel
AU - Chan, Ellick
AU - Campbell, Roy
PY - 2008
Y1 - 2008
N2 - Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU threads and SIMD processing elements, the principles involved in dividing the steps of a parallel algorithm for execution on GPU architectures remains a significant challenge. In this paper, we introduce a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions. Clustering algorithms are important for search, data mining, spam and intrusion detection applications. Modern desktop machines commonly include desktop search software that can be greatly enhanced by these advances, while low-power machines such as laptops can reduce power consumption by utilizing the video chip for these clustering and indexing operations. Our preliminary results show over a 13x performance improvement compared to a baseline 3 GHz Intel Pentium(R) based PC running the same algorithm with an average spec G80 graphics card, the NVIDIA 8600GT. The low cost of these video cards (less than $100 market price as of 2008), and the high performance gains suggest that our approach is both practical and economical for common applications.
AB - Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU threads and SIMD processing elements, the principles involved in dividing the steps of a parallel algorithm for execution on GPU architectures remains a significant challenge. In this paper, we introduce a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions. Clustering algorithms are important for search, data mining, spam and intrusion detection applications. Modern desktop machines commonly include desktop search software that can be greatly enhanced by these advances, while low-power machines such as laptops can reduce power consumption by utilizing the video chip for these clustering and indexing operations. Our preliminary results show over a 13x performance improvement compared to a baseline 3 GHz Intel Pentium(R) based PC running the same algorithm with an average spec G80 graphics card, the NVIDIA 8600GT. The low cost of these video cards (less than $100 market price as of 2008), and the high performance gains suggest that our approach is both practical and economical for common applications.
UR - http://www.scopus.com/inward/record.url?scp=62749178093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=62749178093&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:62749178093
SN - 1601320841
SN - 9781601320841
T3 - Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2008
SP - 340
EP - 345
BT - Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2008
T2 - 2008 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2008
Y2 - 14 July 2008 through 17 July 2008
ER -