TY - JOUR
T1 - Trireme
T2 - Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration
AU - Zacharopoulos, Georgios
AU - Ejjeh, Adel
AU - Jing, Ying
AU - Yang, En Yu
AU - Jia, Tianyu
AU - Brumar, Iulian
AU - Intan, Jeremy
AU - Huzaifa, Muhammad
AU - Adve, Sarita
AU - Adve, Vikram
AU - Wei, Gu Yeon
AU - Brooks, David
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/4/20
Y1 - 2023/4/20
N2 - The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain-specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms, and Catapult HLS [7] was used to synthesize RTL using a commercial 12 nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20×, as well as a speedup of up to 37× for smaller applications, compared to software-only implementations.
AB - The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain-specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms, and Catapult HLS [7] was used to synthesize RTL using a commercial 12 nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20×, as well as a speedup of up to 37× for smaller applications, compared to software-only implementations.
KW - ASICs
KW - Accelerators
KW - compiler techniques and optimizations
KW - design tools
KW - heterogeneous systems parallelism
UR - http://www.scopus.com/inward/record.url?scp=85163771610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163771610&partnerID=8YFLogxK
U2 - 10.1145/3580394
DO - 10.1145/3580394
M3 - Article
AN - SCOPUS:85163771610
SN - 1539-9087
VL - 22
JO - ACM Transactions on Embedded Computing Systems
JF - ACM Transactions on Embedded Computing Systems
IS - 3
M1 - 53
ER -