We develop a method for aggregating large Markov chains into smaller representative Markov chains, where Markov chains are viewed as weighted directed graphs, and similar nodes (and edges) are aggregated using a deterministic annealing approach. The notions of representativeness of the aggregated graphs and similarity between nodes in graphs are based on a newly proposed metric that quantifies connectivity in the underlying graph. Namely, we develop notions of distance between subchains in Markov chains, and provide easily verifiable conditions that determine if a given Markov chain is nearly decomposable, that is, conditions for which the deterministic annealing approach can be used to identify subchains with high probability. We show that the aggregated Markov chain preserves certain dynamics of the original chain. In particular we provide explicit bounds on the ℓ1 norm of the error between the aggregated stationary distribution of the original Markov chain and the stationary distribution of the aggregated Markov chain, which extends on longstanding foundational results (Simon and Ando, 1961).