As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores. These techniques exploit the differences in geometry between chip multiprocessors and traditional multiprocessor architectures. Directory-based protocols have been proposed as a scalable alternative to snoop-based protocols. In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors. Proximity-aware coherence accelerates read and write misses by initiating cache-to-cache transfers from the spatially closest sharer. This has the dual benefit of eliminating unnecessary accesses to off-chip memory, and minimizing the distance over which communicated data moves across the network. The proposed schemes result in speedups up to 74.9% for our workloads.