High-throughput and Flexible Host Networking for Accelerated Computing

Athinagoras Skiadopoulos, Zhiqiang Xie, Mark Zhao, Qizhe Cai, Saksham Agarwal, Jacob Adelmann, David Ahern, Carlo Contavalli, Michael Goldflam, Vitaly Mayatskikh, Raghu Raja, Daniel Walton, Rachit Agarwal, Shrijeet Mukherjee, Christos Kozyrakis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern network hardware is able to meet the stringent bandwidth demands of applications like GPU-accelerated AI. However, existing host network stacks offer a hard tradeoff between performance (in terms of sustained throughput when compared to network hardware capacity) and flexibility (in terms of the ability to select, customize, and extend different network protocols). This paper explores a clean-slate approach to simultaneously offer high performance and flexibility. We present a co-design of the NIC hardware and the software stack to achieve this. The key idea in our design is the physical separation of the data path (payload transfer between network and application buffers) and the control path (header processing and transport-layer decisions). The NIC enables a high-performance zero-copy data path, independent of the placement of the application (CPU, GPU, FPGA, or other accelerators). The software stack provides a flexible control path by enabling the integration of any network protocol, executing in any environment (in the kernel, in user space, or in an accelerator). We implement and evaluate ZeroNIC, a prototype that combines an FPGA-based NIC with a software stack that integrates the Linux TCP protocol. We demonstrate that ZeroNIC achieves RDMA-like throughput while maintaining the benefits of robust protocols like TCP under various network perturbations. For instance, ZeroNIC enables a single TCP flow to saturate a 100Gbps link while utilizing only 17% of a single CPU core. ZeroNIC improves NCCL and Redis throughput by 2.66× and 3.71×, respectively, over Linux TCP on a Mellanox ConnectX-6 NIC, without requiring application modifications.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024
PublisherUSENIX Association
Pages405-423
Number of pages19
ISBN (Electronic)9781939133403
StatePublished - 2024
Externally publishedYes
Event18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024 - Santa Clara, United States
Duration: Jul 10 2024Jul 12 2024

Publication series

NameProceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024

Conference

Conference18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024
Country/TerritoryUnited States
CitySanta Clara
Period7/10/247/12/24

ASJC Scopus subject areas

  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'High-throughput and Flexible Host Networking for Accelerated Computing'. Together they form a unique fingerprint.

Cite this