Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device

Seung Won Min, Sitao Huang, Mohamed El-Hadedy, Jinjun Xiong, Deming Chen, Wen Mei Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.

Original languageEnglish (US)
Title of host publicationProceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019
EditorsIoannis Sourdis, Christos-Savvas Bouganis, Carlos Alvarez, Leonel Antonio Toledo Diaz, Pedro Valero, Xavier Martorell
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages301-306
Number of pages6
ISBN (Electronic)9781728148847
DOIs
StatePublished - Sep 2019
Event29th International Conferenceon Field-Programmable Logic and Applications, FPL 2019 - Barcelona, Spain
Duration: Sep 9 2019Sep 13 2019

Publication series

NameProceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019

Conference

Conference29th International Conferenceon Field-Programmable Logic and Applications, FPL 2019
CountrySpain
CityBarcelona
Period9/9/199/13/19

Fingerprint

systems-on-a-chip
Particle accelerators
Field programmable gate arrays (FPGA)
accelerators
Program processors
optimization
hardware
platforms
computer programs
Computer hardware
communication
bandwidth
Hardware
Bandwidth
System-on-chip
Communication

Keywords

  • Cache
  • Cache coherence
  • FPGA
  • Heterogenous computing

ASJC Scopus subject areas

  • Instrumentation
  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture

Cite this

Min, S. W., Huang, S., El-Hadedy, M., Xiong, J., Chen, D., & Hwu, W. M. (2019). Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. In I. Sourdis, C-S. Bouganis, C. Alvarez, L. A. Toledo Diaz, P. Valero, & X. Martorell (Eds.), Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019 (pp. 301-306). [8892094] (Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FPL.2019.00055

Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. / Min, Seung Won; Huang, Sitao; El-Hadedy, Mohamed; Xiong, Jinjun; Chen, Deming; Hwu, Wen Mei.

Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019. ed. / Ioannis Sourdis; Christos-Savvas Bouganis; Carlos Alvarez; Leonel Antonio Toledo Diaz; Pedro Valero; Xavier Martorell. Institute of Electrical and Electronics Engineers Inc., 2019. p. 301-306 8892094 (Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Min, SW, Huang, S, El-Hadedy, M, Xiong, J, Chen, D & Hwu, WM 2019, Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. in I Sourdis, C-S Bouganis, C Alvarez, LA Toledo Diaz, P Valero & X Martorell (eds), Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019., 8892094, Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019, Institute of Electrical and Electronics Engineers Inc., pp. 301-306, 29th International Conferenceon Field-Programmable Logic and Applications, FPL 2019, Barcelona, Spain, 9/9/19. https://doi.org/10.1109/FPL.2019.00055
Min SW, Huang S, El-Hadedy M, Xiong J, Chen D, Hwu WM. Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. In Sourdis I, Bouganis C-S, Alvarez C, Toledo Diaz LA, Valero P, Martorell X, editors, Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 301-306. 8892094. (Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019). https://doi.org/10.1109/FPL.2019.00055
Min, Seung Won ; Huang, Sitao ; El-Hadedy, Mohamed ; Xiong, Jinjun ; Chen, Deming ; Hwu, Wen Mei. / Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device. Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019. editor / Ioannis Sourdis ; Christos-Savvas Bouganis ; Carlos Alvarez ; Leonel Antonio Toledo Diaz ; Pedro Valero ; Xavier Martorell. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 301-306 (Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019).
@inproceedings{fc85a4db1bbf4c45ad676a50f4726311,
title = "Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device",
abstract = "Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20{\%}.",
keywords = "Cache, Cache coherence, FPGA, Heterogenous computing",
author = "Min, {Seung Won} and Sitao Huang and Mohamed El-Hadedy and Jinjun Xiong and Deming Chen and Hwu, {Wen Mei}",
year = "2019",
month = "9",
doi = "10.1109/FPL.2019.00055",
language = "English (US)",
series = "Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "301--306",
editor = "Ioannis Sourdis and Christos-Savvas Bouganis and Carlos Alvarez and {Toledo Diaz}, {Leonel Antonio} and Pedro Valero and Xavier Martorell",
booktitle = "Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019",
address = "United States",

}

TY - GEN

T1 - Analysis and optimization of I/O cache coherency strategies for SoC-FPGA device

AU - Min, Seung Won

AU - Huang, Sitao

AU - El-Hadedy, Mohamed

AU - Xiong, Jinjun

AU - Chen, Deming

AU - Hwu, Wen Mei

PY - 2019/9

Y1 - 2019/9

N2 - Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.

AB - Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%.

KW - Cache

KW - Cache coherence

KW - FPGA

KW - Heterogenous computing

UR - http://www.scopus.com/inward/record.url?scp=85075640722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075640722&partnerID=8YFLogxK

U2 - 10.1109/FPL.2019.00055

DO - 10.1109/FPL.2019.00055

M3 - Conference contribution

AN - SCOPUS:85075640722

T3 - Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019

SP - 301

EP - 306

BT - Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019

A2 - Sourdis, Ioannis

A2 - Bouganis, Christos-Savvas

A2 - Alvarez, Carlos

A2 - Toledo Diaz, Leonel Antonio

A2 - Valero, Pedro

A2 - Martorell, Xavier

PB - Institute of Electrical and Electronics Engineers Inc.

ER -