This paper presents a novel research problem, Comparative Docu- ment Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of doc- uments) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a gen- eral graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the back- ground corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains-scientific papers and news- demonstrate the effectiveness and robustness of the proposed frame- work on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.