This paper addresses model reduction for a Markov chain on a large state space. A simulation-based framework is introduced to perform state aggregation of the Markov chain based on observations of a single sample path. The Kullback-Leibler (K-L) divergence rate is employed as a metric to measure the distance between two stationary Markov chains. Model reduction with respect to this metric is cast as an infinite-horizon average cost optimal control problem. In this way an optimal policy corresponds to an optimal partition of the state space with respect to the K-L divergence rate. The optimal control problem is simplified in an approximate dynamic programming (ADP) framework: First, a relaxation of the policy space is performed, and based on this a parameterization of the set of optimal policies is introduced. This makes possible a stochastic approximation approach to compute the best policy within a given parameterized class. The algorithm can be implemented using a single sample path of the Markov chain. Convergence is established using the ODE method. Examples illustrate the theoretical results, and show remarkably low variance and fast convergence.