New architectures for extreme-scale computing need to bedesigned for higher energy efficiency than current systems. One recently-proposed extreme-scale manycore radically simplifiesthe architecture, and proposes a cluster-based on-chip memory hierarchy withouthardware cache coherence. To program for such an environment, this paper proposes twoapproaches. They use shared-memory programmingeither inside clusters only, or both inside and across clusters. Both approaches rely on ISA support for writeback and self-invalidation operations. Our simulation results show thathardware-incoherent cache hierarchies with our support deliverreasonable performance for applications that were notwritten for such hierarchies. Specifically, forexecution within a cluster, the averageexecution time of the applications is 2% higher than with hardware cache coherence, for execution across multiple clusters, it is 5% higher than with hardware cache coherence. This is accomplished with minimal hardware support.