The focus this paper is on the implementation of a 3D Navier-Stokes solver on GPUs using the CUDA programming architecture. A Navier-Stokes code has been developed using the fractional step method for discretization of governing equations. The code was first validated by computing the 3D lid-driven cavity flow in a cube for a Newtonian fluid and comparing the results with those available in literature. The code now has been extended to compute the non-Newtonian flow in the lid-driven cubic cavity using the power-law (Ostwald-deWaele) model as the non-linear stress-strain constitutive model. This code has been implemented on NVIDIA GPUs. Depending upon the size of the problem, a significant improvement in speedup is obtained for both Newtonian and non-Newtonian flow. The results demonstrate the power of CUDA with a GPU in achieving high computing performance for large scale scientific problems which have a large part of the code that can be parallelized.