Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU threads and SIMD processing elements, the principles involved in dividing the steps of a parallel algorithm for execution on GPU architectures remains a significant challenge. In this paper, we introduce a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions. Clustering algorithms are important for search, data mining, spam and intrusion detection applications. Modern desktop machines commonly include desktop search software that can be greatly enhanced by these advances, while low-power machines such as laptops can reduce power consumption by utilizing the video chip for these clustering and indexing operations. Our preliminary results show over a 13x performance improvement compared to a baseline 3 GHz Intel Pentium(R) based PC running the same algorithm with an average spec G80 graphics card, the NVIDIA 8600GT. The low cost of these video cards (less than $100 market price as of 2008), and the high performance gains suggest that our approach is both practical and economical for common applications.