While there has been extensive work on the design of software transactional memory (STM) for cache coherent shared memory systems, there has been no work on the design of an STM system for very large scale platforms containing potentially thousands of nodes. In this work, we present Cluster-STM, an STM designed for high performance on large-scale commodity clusters. Our design addresses several novel issues posed by this domain, including aggregating communication, managing locality, and distributing transactional metadata onto the nodes. We also re-evaluate several STM design choices previously studied for cache-coherent machines and conclude that, in some cases, different choices are appropriate on clusters. Finally, we show that our design scales well up to 512 processors. This is because on a cluster, the main barrier to STM scalability is the remote communication overhead imposed by the STM operations, and our design aggregates most of that communication with the communication of the underlying data.