The advent of multi-core processors with a largenumber of cores and heterogeneous architecture poses challengesfor achieving scalable cache coherence. Several recent researchefforts have focused on simplifying or abandoning hardwarecache coherence protocols. However, this adds a significantburden on the programmer, unless automated compiler supportis developed. In this paper, we develop compiler support for parallel systemsthat delegate the task of maintaining cache coherence to software. Algorithms to automatically insert software cache coherence instructions into parallel applications are presented. This freesthe programmer from having to manually insert coherenceannotations, which can be tedious and error-prone. Experimental evaluation over a number of benchmarks demonstrates thateffective compiler techniques can make software cache coherencecompetitive with hardware coherence schemes both in terms ofenergy and performance.