Presented in this paper is a low-power architecture for turbo decoding of parallel concatenated convolutional codes. The proposed architecture is derived via the concept of block-interleaved computation followed by folding, retiming and voltage scaling. Block-interleaved computation can be applied to any data processing unit that operates on data blocks and satisfies the following three properties: 1) computation between blocks are independent; 2) a block can be segmented into computationally independent sub-blocks; and 3) computation within a sub-block is recursive. The application of block-interleaved computation, folding and retiming reduces the critical path delay in the add-compare-select (ACS) kernel of MAP decoders by 50%-84% with an area overhead of 14%-70%. Subsequent application of voltage scaling results in up to 65% savings in power for a block-interleaving depth of 6. Experimental results obtained by transistor-level timing and power analysis tools demonstrate power savings of 20%-44% for a block-interleaving depth of 2 in a 0.25 μm CMOS process.