3D human pose estimation from monocular images has become a heated area in computer vision recently. For years, most deep neural network based practices have adopted either an end-to-end approach, or a two-stage approach. An end-to-end network typically estimates 3D human poses directly from 2D input images, but it suffers from the shortage of 3D human pose data. It is also obscure to know if the inaccuracy stems from limited visual understanding or 2D-to-3D mapping. Whereas a two-stage directly lifts those 2D keypoint outputs to the 3D space, after utilizing an existing network for 2D keypoint detections. However, they tend to ignore some useful contextual hints from the 2D raw image pixels. In this paper, we introduce a two-stage architecture that can eliminate the main disadvantages of both these approaches. During the first stage we use an existing state-of-the-art detector to estimate 2D poses. To add more contextual information to help lifting 2D poses to 3D poses, we propose 3D Part Affinity Fields (3D-PAFs). We use 3D-PAFs to infer 3D limb vectors, and combine them with 2D poses to regress the 3D coordinates. We trained and tested our proposed framework on Human3.6M, the most popular 3D human pose benchmark dataset. Our approach achieves the state-of-the-art performance, which proves that with right selections of contextual information, a simple regression model can be very powerful in estimating 3D poses.