The sparsity of natural signals in transform domains such as the DCT has been heavily exploited in various applications. Recently, we introduced the idea of learning sparsifying transforms from data, and demonstrated the usefulness of learnt transforms in image representation, and denoising. However, the learning formulations therein were non-convex, and the algorithms lacked strong convergence properties. In this work, we propose a novel convex formulation for square sparsifying transform learning. We also enforce a doubly sparse structure on the transform, which makes its learning, storage, and implementation efficient. Our algorithm is guaranteed to converge to a global optimum, and moreover converges quickly. We also introduce a non-convex variant of the convex formulation, for which the algorithm is locally convergent. We show the superior promise of our learnt transforms as compared to analytical sparsifying transforms such as the DCT for image representation.