This paper addresses the problem of designing energy-efficient embedded systems by jointly optimizing the power consumption of both the DC-DC converter and the computational core. Past work has shown that there exists a minimum energy operating point (MEOP) in the subthreshold region for computational cores (C-MEOP), at which the dynamic and leakage powers are balanced. The MEOP is defined by the 3-tuple consisting of the optimum energy consumption E *, optimum voltage V* and optimum frequency f*. First, we show that the DC-DC converter losses in dynamic voltage scaling (DVS) cause the overall system MEOP (S-MEOP) to differ significantly from C-MEOP. Simulations in a 130-nm, 1.2V commercial CMOS process show that operation at S-MEOP results in a 45.5% energy savings over operating at a core voltage V*C suggested by C-MEOP. The DC-DC converter efficiency is also improved by 2.2X. Second, we show that architectural techniques such as parallelization cause the S-MEOP to approach C-MEOP. Thus, it is sufficient to track C-MEOP a much easier task on-chip in order to account for process variations. We show that DC-DC converter losses reduces in subthreshold region but increases in superthreshold region when parallelization is employed. This observation leads us to propose a reconfigurable core architecture that improves the converter efficiency by 2.3X at C-MEOP, and makes energy consumption at S-MEOP and C-MEOP to be within 4% of each other, while improving throughput in the subthreshold region by at least 8X. Finally, we show that pipelining, which has been proposed to decrease core energy at C-MEOP while improving throughput , adversely affects the S-MEOP. The pipelined-core system energy at S-MEOP is 85% lower than the pipelined-core system energy when operating at the C-MEOP voltage V* C.