Context-sensitive pointer analysis algorithms with full "heapcloning" are powerful but are widely considered to be too expensive to include in production compilers. This paper shows, for the first time, that a context-sensitive, field-sensitive algorithm with fullheap cloning (by acyclic call paths) can indeed be both scalable and extremely fast in practice. Overall, the algorithm is able to analyze programs in the range of 100K-200K lines of C code in 1-3 seconds,takes less than 5% of the time it takes for GCC to compile the code (which includes no whole-program analysis), and scales well across five orders of magnitude of code size. It is also able to analyze the Linux kernel (about 355K linesof code) in 3.1 seconds. The paper describes the major algorithmic and engineering design choices that are required to achieve these results, including (a) using flow-insensitive and unification-basedanalysis, which are essential to avoid exponential behavior in practice;(b) sacrificing context-sensitivity within strongly connected components of the call graph; and (c) carefully eliminating several kinds of O(N 2) behaviors (largely without affecting precision). The techniques used for (b) and (c) eliminated several major bottlenecks to scalability, and both are generalizable to other context-sensitive algorithms. We show that the engineering choices collectively reduce analysis time by factors of up to 10x-15xin our larger programs, and have found that the savings grow strongly with program size. Finally, we briefly summarize results demonstrating the precision of the analysis.