Path-based Analysis (PBA) is an important step in the design closure flow for reducing slack pessimism. However, PBA is extremely time-consuming. Recent years have seen many parallel PBA algorithms, but most of them are architecturally constrained by the CPU parallelism and do not scale beyond a few threads. To overcome this challenge, we propose in this paper a new fast and accurate PBA algorithm by harnessing the power of graphics processing unit (GPU). We introduce GPU-efficient data structures, high-performance kernels, and efficient CPU-GPU task decomposition strateiges, to accelerate PBA to a new performance milestone. Experimental results show that our method can speed up the state-of-the-art algorithm by 543× on a design of 1.6 million gates with exact accuracy. At the extreme, our method of 1 CPU and 1 GPU outperforms the state-of-the-art algorithm of 40 CPUs by 25-45×.