We focus on policy search in reinforcement learning problems over continuous spaces, where the value is defined by infinite-horizon discounted reward accumulation. This is the canonical setting proposed by Bellman . Policy search, specifically, policy gradient (PG) method, scales gracefully to problems with continuous spaces and allows for deep network parametrizations; however, experimentally it is known to be volatile and its finite-time behavior is not well understood. A major source of this gap is that unbiased ascent directions are elusive, and hence only asymptotic convergence to stationarity can be shown via links to ordinary differential equations . In this work, we propose a new variant of PG methods that uses a random rollout horizon for the Monte-Carlo estimation of the policy gradient, which we establish yields an unbiased policy search direction. Furthermore, we conduct global convergence analysis from a nonconvex optimization perspective: (i) we first recover the results of asymptotic convergence to the stationary-point policies in the literature through an alternative supermartingale argument; (ii) we provide iteration complexity, i.e., convergence rate, of policy gradient in the infinite-horizon setting, showing that it exhibits comparable rates to stochastic gradient method in the nonconvex regime for diminishing and constant stepsize rules. Numerical experiments on the inverted pendulum demonstrate the validity of our results.