TY - GEN
T1 - QEI
T2 - 27th Annual IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
AU - Yuan, Yifan
AU - Wang, Yipeng
AU - Wang, Ren
AU - Chowhury, Rangeen Basu Roy
AU - Tai, Charlie
AU - Kim, Nam Sung
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/2
Y1 - 2021/2
N2 - Data query operations of different data structures are ubiquitous and critical in today's data center infrastructures and applications. However, query operations are not always performance-optimal to be executed on general-purpose CPU cores. These operations exhibit insufficient memory-level parallelism and frontend bottlenecks due to unstructured control flow. Furthermore, the data access patterns are not cache-or prefetch-friendly. Based on our performance analysis on a commodity server, query operations can consume a large percentage of the CPU cycles in various modern cloud workloads. Existing accelerator solutions for query operations do not strike a balance between their generality, scalability, latency, and hardware complexity. In this paper, we propose QEI, a generic, integrated, and efficient acceleration solution for various data structure queries. We first abstract the query operations to a few regular steps and map them to a simple and hardware-friendly configurable finite automaton model. Based on this model, we develop the QEI architecture that allows multiple query operations to execute in parallel to maximize throughput. We also propose a novel way to integrate the accelerator into the CPU that balances performance, latency, and hardware cost. QEI keeps the main control logic near the L2 cache to leverage existing hardware resources in the core while distributing the data-intensive comparison logic to each last-level cache slice for higher parallelism. Our results with five representative data center workloads show that QEI can achieve 6. 5 \times \sim 11. 2 \times performance improvement in various scenarios with low overhead.
AB - Data query operations of different data structures are ubiquitous and critical in today's data center infrastructures and applications. However, query operations are not always performance-optimal to be executed on general-purpose CPU cores. These operations exhibit insufficient memory-level parallelism and frontend bottlenecks due to unstructured control flow. Furthermore, the data access patterns are not cache-or prefetch-friendly. Based on our performance analysis on a commodity server, query operations can consume a large percentage of the CPU cycles in various modern cloud workloads. Existing accelerator solutions for query operations do not strike a balance between their generality, scalability, latency, and hardware complexity. In this paper, we propose QEI, a generic, integrated, and efficient acceleration solution for various data structure queries. We first abstract the query operations to a few regular steps and map them to a simple and hardware-friendly configurable finite automaton model. Based on this model, we develop the QEI architecture that allows multiple query operations to execute in parallel to maximize throughput. We also propose a novel way to integrate the accelerator into the CPU that balances performance, latency, and hardware cost. QEI keeps the main control logic near the L2 cache to leverage existing hardware resources in the core while distributing the data-intensive comparison logic to each last-level cache slice for higher parallelism. Our results with five representative data center workloads show that QEI can achieve 6. 5 \times \sim 11. 2 \times performance improvement in various scenarios with low overhead.
KW - data query
KW - near-cache processing
KW - on-chip accelerator
UR - http://www.scopus.com/inward/record.url?scp=85104932500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104932500&partnerID=8YFLogxK
U2 - 10.1109/HPCA51647.2021.00040
DO - 10.1109/HPCA51647.2021.00040
M3 - Conference contribution
AN - SCOPUS:85104932500
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 385
EP - 398
BT - Proceeding - 27th IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
PB - IEEE Computer Society
Y2 - 27 February 2021 through 1 March 2021
ER -