Recently, processor manufacturers have integrated more than a hundred cores in a single die to deliver extremely high throughput for highly-parallel, data-intensive applications like physics simulations, 3D-graphics, etc. Meanwhile, excessive power consumption rather than silicon area will limit the performance of many-core processors running the aforementioned applications. In this paper, to optimize the total power of many-core processors, we analyze the impact of 1) the number of cores, 2) parallelism in applications, and 3) supply voltage scaling limit due to on-die memory failure at low supply voltage. Our analysis shows that doubling the number of cores with lower than nominal supply voltage offers the most cost-effective power reduction, resulting in up to 65% less power consumption for highly-parallel applications even when supply voltage scaling is limited to 0.7V. The reduced power, in turn, can be used to improve throughput at higher voltage in power-constrained many-core processors. Furthermore, we extend our analysis to consider within-die core-to-core frequency and leakage variations. When only a subset of cores in a many-core processor are to be chosen to achieve a demanded throughput, moderately fast and leaky cores always provide optimal power consumption. In addition, frequency-island clocking, which allows independent frequency for each core, leads to ∼7% less power consumption than global clocking, and it prefers the fastest core (among the chosen ones) to process the totally sequential portion of workload.