TY - GEN
T1 - A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors
AU - Kuper, Reese
AU - Jeong, Ipoom
AU - Yuan, Yifan
AU - Wang, Ren
AU - Ranganathan, Narayan
AU - Rao, Nikhil
AU - Hu, Jiayu
AU - Kumar, Sanjay
AU - Lantz, Philip
AU - Kim, Nam Sung
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/4/27
Y1 - 2024/4/27
N2 - As semiconductor power density is no longer constant with the technology process scaling down, we need different solutions if we are to continue scaling application performance. To this end, modern CPUs are integrating capable data accelerators on the chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel® Data Streaming Accelerator (DSA) introduced since Intel® 4th Generation Xeon® Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it supports a wider range of operations on streaming data, such as CRC32 calculations, computation of deltas between data buffers, and data integrity field (DIF) operations. This paper aims to introduce the latest features supported by DSA, dive deep into its versatility, and analyze its throughput benefits through a comprehensive evaluation with both microbenchmarks and real use cases. Along with the analysis of its characteristics and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.
AB - As semiconductor power density is no longer constant with the technology process scaling down, we need different solutions if we are to continue scaling application performance. To this end, modern CPUs are integrating capable data accelerators on the chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel® Data Streaming Accelerator (DSA) introduced since Intel® 4th Generation Xeon® Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it supports a wider range of operations on streaming data, such as CRC32 calculations, computation of deltas between data buffers, and data integrity field (DIF) operations. This paper aims to introduce the latest features supported by DSA, dive deep into its versatility, and analyze its throughput benefits through a comprehensive evaluation with both microbenchmarks and real use cases. Along with the analysis of its characteristics and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.
KW - accelerator
KW - data streaming accelerator (DSA)
KW - measurement
UR - http://www.scopus.com/inward/record.url?scp=85192193946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192193946&partnerID=8YFLogxK
U2 - 10.1145/3620665.3640401
DO - 10.1145/3620665.3640401
M3 - Conference contribution
AN - SCOPUS:85192193946
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 37
EP - 54
BT - Summer Cycle
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
Y2 - 27 April 2024 through 1 May 2024
ER -