This paper explores the effectiveness of the simultaneous application of pipelining and parallel processing as a total power (static plus dynamic) reduction technique in digital systems. Previous studies have been limited to either pipelining or parallel processing, but both techniques can be used together to reduce supply voltage at a fixed throughput point. According to our first-order analyses, there exist optimal combinations of pipelining depth and parallel processing width to minimize total power consumption. We show that the leakage power from both subthreshold and gate-oxide tunneling plays a significant role in determining the optimal combination of pipelining depth and parallel processing width. Our experiments are conducted with timing information derived from a 65nm technology and fanout-of-four (FO4) inverter chains. The experiments show that the optimal combinations of both pipelining and parallel processing - 8-12×FO4 logic depth pipelining with 2-3-wide parallel processing - can reduce the total power by as much as 40% compared to an optimal system using only pipelining or parallel processing alone. We extend our study to show how process parameter variations - an increasingly important factor in nanometer technologies - affects these results. Our analyses reveal that the variations shift the optimal points to shallower pipelining and narrower parallel processing - 12×FO4 logic depth with 2-wide parallel processing - at a fixed yield point.