TY - JOUR
T1 - ReAAP
T2 - A Reconfigurable and Algorithm-Oriented Array Processor With Compiler-Architecture Co-Design
AU - Zheng, Jianwei
AU - Liu, Yu
AU - Liu, Xuejiao
AU - Liang, Luhong
AU - Chen, Deming
AU - Cheng, Kwang Ting
N1 - Funding Information:
This work was supported by ACCESS - AI Chip Center for Emerging Smart Systems, sponsored by Innovation and Technology Fund (ITF), Hong Kong SAR and AMD/Xilinx Center of Excellence at University of Illinois at Urbana- Champaign.
Publisher Copyright:
© 1968-2012 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Parallelism and data reuse are the most critical issues for the design of hardware acceleration in a deep learning processor. Besides, abundant on-chip memories and precise data management are intrinsic design requirements because most of deep learning algorithms are data-driven and memory-bound. In this paper, we propose a compiler-architecture co-design scheme targeting a reconfigurable and algorithm-oriented array processor, named ReAAP. Given specific deep neural networks, the proposed co-design scheme is effective to perform parallelism and data reuse optimization on compute-intensive layers for guiding reconfigurable computing in hardware. Especially, the systemic optimization is performed in our proposed domain-specific compiler to deal with the intrinsic tensions between parallelism and data locality, for the purpose of automatically mapping diverse layer-level workloads onto our proposed reconfigurable array architecture. In this architecture, abundant on-chip memories are software-controlled and its massive data access is precisely handled by compiler-generated instructions. In our experiments, the ReAAP is implemented on an embedded FPGA platform. Experimental results demonstrate that our proposed co-design scheme is effective to integrate software flexibility with hardware parallelism for accelerating diverse deep learning workloads. As a whole system, ReAAP achieves a consistently high utilization of hardware resource for accelerating all the diverse compute-intensive layers in ResNet, MobileNet, and BERT.
AB - Parallelism and data reuse are the most critical issues for the design of hardware acceleration in a deep learning processor. Besides, abundant on-chip memories and precise data management are intrinsic design requirements because most of deep learning algorithms are data-driven and memory-bound. In this paper, we propose a compiler-architecture co-design scheme targeting a reconfigurable and algorithm-oriented array processor, named ReAAP. Given specific deep neural networks, the proposed co-design scheme is effective to perform parallelism and data reuse optimization on compute-intensive layers for guiding reconfigurable computing in hardware. Especially, the systemic optimization is performed in our proposed domain-specific compiler to deal with the intrinsic tensions between parallelism and data locality, for the purpose of automatically mapping diverse layer-level workloads onto our proposed reconfigurable array architecture. In this architecture, abundant on-chip memories are software-controlled and its massive data access is precisely handled by compiler-generated instructions. In our experiments, the ReAAP is implemented on an embedded FPGA platform. Experimental results demonstrate that our proposed co-design scheme is effective to integrate software flexibility with hardware parallelism for accelerating diverse deep learning workloads. As a whole system, ReAAP achieves a consistently high utilization of hardware resource for accelerating all the diverse compute-intensive layers in ResNet, MobileNet, and BERT.
KW - Domain-specific processor
KW - compiler-architecture co-design
KW - diverse layer-level workloads
KW - polyhedral modeling
KW - reconfigurable computing
UR - http://www.scopus.com/inward/record.url?scp=85139855954&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139855954&partnerID=8YFLogxK
U2 - 10.1109/TC.2022.3213177
DO - 10.1109/TC.2022.3213177
M3 - Article
AN - SCOPUS:85139855954
SN - 0018-9340
VL - 71
SP - 3088
EP - 3100
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 12
ER -