The lack of hardware support for floating-point arithmetic in low-power software-defined radio architectures can significantly increase their software design time due to a time-consuming process of converting floating-point code to fixed-point code. Moreover, emerging wireless communication protocols involve several matrix based algorithms that are extremely sensitive to round-off errors in computations. Using fixed-point arithmetic for these algorithms can significantly impact the accuracy of algorithm results and may incur additional energy overhead due to the extra instructions required for fixed-point arithmetic. In this paper, we demonstrate that supporting floating-point arithmetic in hardware can deliver nearly 30% higher performance and energy efficiency than supporting only fixed-point arithmetic for key kernels of modern wireless communication protocols. The improvements can be further enhanced by our proposed high-throughput floating-point fused-multiply-add unit. Applying our proposed fused-multiply-add unit to key kernels improves performance of the baseline floating-point unit by as much as 60%, while reducing energy consumption by 30% and area by 33%. Although our approach may cause execution stalls depending on data, we show the performance impact of these stalls is negligible. We also employ dynamic range-based dynamic voltage and frequency scaling to further reduce the energy consumption of the processor by 25% for the same worst-case performance as the baseline floating-point implementation.