Non-equispaced fast Fourier transform (NFFT) has attracted significant interest for its applications in tomography and remote sensing where visualization and image reconstruction require non-equispaced data. Here we present an efficient implementation of high accuracy NFFT on an NVIDIA GPU (Graphic Processing Unit). We focused on the convolution step in the computation of NFFT, since it is the most time consuming portion of the algorithm. In order to achieve high efficiency in on-chip memory usage, we used pre-computed compressed datasets to avoid the write-conflict. The performance was measured by comparing with the available GPU version called CUNFFT1. We demonstrate an improved performance ratio of 4X (random dataset) and 2X (radial dataset) in single precision. When compared to the CPU version, we measured 78X in the peak performance. Furthermore, to illustrate the potential of NFFT in tomography visualization applications, we evaluated the forward NFFT performance for a three dimensional (3D) object constructed of atoms in a nanoparticle.