In this paper, we describe the first implementation and performance of a fast O(N3logN) hierarchical backprojection algorithm for cone beam CT with a circular trajectory1,developed on a modern Graphics Processing Unit (GPU). The resulting tomographic backprojection system for 3D cone beam geometry combines speedup through algorithmic improvements provided by the hierarchical backprojection algorithm with speedup from a massively parallel hardware accelerator. For data parameters typical in diagnostic CT and using a mid-range GPU card, we report reconstruction speeds of up to 360 frames per second, and relative speedup of almost 6× compared to conventional backprojection on the same hardware. The significance of these results is twofold. First, they demonstrate that the reduction in operation counts demonstrated previously for the FHBP algorithm can be translated to a comparable run-time improvement in a massively parallel hardware implementation, while preserving stringent diagnostic image quality. Second, the dramatic speedup and throughput numbers achieved indicate the feasibility of systems based on this technology, which achieve real-time 3D reconstruction for state-of-the art diagnostic CT scanners with small footprint, high-reliability, and affordable cost.