Meeting power requirements of huge exascale machines of the future will be a major challenge. Our focus in this paper is to minimize cooling power and we propose a technique that uses a combination of DVFS and temperature aware load balancing to constrain core temperatures as well as save cooling energy. Our scheme is specifically designed to suit parallel applications which are typically tightly coupled. The temperature control, comes at the cost of execution time and we try to minimize the timing penalty. We experiment with three applications (with different power utilization profiles), run on a 128-core (32-node) cluster with a dedicated air conditioning unit. We calibrate the efficacy of our scheme based on three metrics: ability to control average core temperatures thereby avoiding hot spot occurence, timing penalty minimization, and cooling energy savings. Our results show cooling energy savings of up to 57% with a timing penalty of 19%.