TY - GEN
T1 - Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device
AU - Min, Seung Won
AU - Huang, Sitao
AU - El-Hadedy, Mohamed
AU - Xiong, Jinjun
AU - Chen, Deming
AU - Hwu, Wen Mei
N1 - Funding Information:
VIII. ACKNOWLEDGEMENTS This work was supported by the Applications Driving Architectures (ADA) Research Center, a JUMP Center cosponsored by SRC and DARPA, and IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) – a research collaboration as part of the IBM AI Horizon Network.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.
AB - Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.
KW - Cache
KW - Cache coherence
KW - FPGA
KW - Heterogenous computing
UR - http://www.scopus.com/inward/record.url?scp=85075640722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075640722&partnerID=8YFLogxK
U2 - 10.1109/FPL.2019.00055
DO - 10.1109/FPL.2019.00055
M3 - Conference contribution
AN - SCOPUS:85075640722
T3 - Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019
SP - 301
EP - 306
BT - Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019
A2 - Sourdis, Ioannis
A2 - Bouganis, Christos-Savvas
A2 - Alvarez, Carlos
A2 - Toledo Diaz, Leonel Antonio
A2 - Valero, Pedro
A2 - Martorell, Xavier
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 29th International Conferenceon Field-Programmable Logic and Applications, FPL 2019
Y2 - 9 September 2019 through 13 September 2019
ER -