Real-time 3D sound localization is an important technology for various applications such as camera steering systems, robotics audition, and gunshot direction. 3D sound localization adds a new dimension, but also significantly increases the computational requirements. Real-time 3D sound localization continuously processes large volumes of data for each possible 3D direction and acoustic frequency range. Such highly demanding compute requirements outpace current CPU compute abilities. This paper develops a real-time implementation of 3D sound localization on Graphical Processing Units (GPUs). Massively parallel GPU architectures are shown to be well suited for 3D sound localization. We optimize various aspects of GPU implementation, such as number of threads per thread block, register allocation per thread, and memory data layout for performance improvement. Experiments indicate that our GPU implementation achieves 501X and 130X speedup compared to a single-thread and a multi-thread CPU implementation respectively, thus enabling real-time operation of 3D sound localization.