TY - GEN
T1 - Energy-efficient floating-point arithmetic for digital signal processors
AU - Gilani, Syed Zohaib
AU - Kim, Nam Sung
AU - Schulte, Michael
PY - 2011
Y1 - 2011
N2 - Emerging image and signal processing applications involve several matrix-based algorithms that are extremely sensitive to round-off error in computations. Implementing these applications on fixed-point (FxP) processors can significantly increase their design-time and may also result in reduced signal-to-noise (SNR) ratios. However, due to the high area and power overhead of floating-point (FP) hardware, low-power DSPs typically do not provide hardware support for floating-point (FP) arithmetic. Moreover, the long latency of FP operations can also reduce the performance of executing signal processing applications. In this paper, we propose a block-floating-point-based fused multiply-add (BFP-FMA) unit with reduced area and power overhead that is tailored to the needs of signal processing applications. Since dot-product instructions are commonly employed in matrix-based kernels, we employ our proposed BFPFMA unit to reduce the latency of dot-product operations by a factor of two. Our proposed FMA unit can improve the performance of executing key DSP kernels by as much as 40%, while reducing energy consumption by 28%. Exploiting BFP arithmetic also allows us to reduce the area and power of the FMA units by 33% and 41%, respectively.
AB - Emerging image and signal processing applications involve several matrix-based algorithms that are extremely sensitive to round-off error in computations. Implementing these applications on fixed-point (FxP) processors can significantly increase their design-time and may also result in reduced signal-to-noise (SNR) ratios. However, due to the high area and power overhead of floating-point (FP) hardware, low-power DSPs typically do not provide hardware support for floating-point (FP) arithmetic. Moreover, the long latency of FP operations can also reduce the performance of executing signal processing applications. In this paper, we propose a block-floating-point-based fused multiply-add (BFP-FMA) unit with reduced area and power overhead that is tailored to the needs of signal processing applications. Since dot-product instructions are commonly employed in matrix-based kernels, we employ our proposed BFPFMA unit to reduce the latency of dot-product operations by a factor of two. Our proposed FMA unit can improve the performance of executing key DSP kernels by as much as 40%, while reducing energy consumption by 28%. Exploiting BFP arithmetic also allows us to reduce the area and power of the FMA units by 33% and 41%, respectively.
KW - energy efficiency
KW - floating-point arithmetic
KW - software-defined radio
UR - https://www.scopus.com/pages/publications/84861302905
UR - https://www.scopus.com/pages/publications/84861302905#tab=citedBy
U2 - 10.1109/ACSSC.2011.6190337
DO - 10.1109/ACSSC.2011.6190337
M3 - Conference contribution
AN - SCOPUS:84861302905
SN - 9781467303231
T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers
SP - 1823
EP - 1827
BT - Conference Record of the 45th Asilomar Conference on Signals, Systems and Computers, ASILOMAR 2011
T2 - 45th Asilomar Conference on Signals, Systems and Computers, ASILOMAR 2011
Y2 - 6 November 2011 through 9 November 2011
ER -