TY - GEN
T1 - Skew-Oblivious Data Routing for Data Intensive Applications on FPGAs with HLS
AU - Chen, Xinyu
AU - Tan, Hongshi
AU - Chen, Yao
AU - He, Bingsheng
AU - Wong, Weng Fai
AU - Chen, Deming
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/12/5
Y1 - 2021/12/5
N2 - FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the state-of-the-art designs in both throughput and BRAM usage efficiency.
AB - FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the state-of-the-art designs in both throughput and BRAM usage efficiency.
UR - http://www.scopus.com/inward/record.url?scp=85119399927&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119399927&partnerID=8YFLogxK
U2 - 10.1109/DAC18074.2021.9586184
DO - 10.1109/DAC18074.2021.9586184
M3 - Conference contribution
AN - SCOPUS:85119399927
T3 - Proceedings - Design Automation Conference
SP - 937
EP - 942
BT - 2021 58th ACM/IEEE Design Automation Conference, DAC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 58th ACM/IEEE Design Automation Conference, DAC 2021
Y2 - 5 December 2021 through 9 December 2021
ER -