Accelerating distributed reinforcement learning with in-switch computing

Youjie Li, Iou Jen Liu, Yifan Yuan, Deming Chen, Alexander Schwing, Jian Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reinforcement learning (RL) has attracted much attention recently, as new and emerging AI-based applications are demanding the capabilities to intelligently react to environment changes. Unlike distributed deep neural network (DNN) training, the distributed RL training has its unique workload characteristics - it generates orders of magnitude more iterations with much smaller sized but more frequent gradient aggregations. More specifically, our study with typical RL algorithms shows that their distributed training is latency critical and that the network communication for gradient aggregation occupies up to 83.2% of the execution time of each training iteration. In this paper, we present iSwitch, an in-switch acceleration solution that moves the gradient aggregation from server nodes into the network switches, thus we can reduce the number of network hops for gradient aggregation. This not only reduces the end-to-end network latency for synchronous training, but also improves the convergence with faster weight updates for asynchronous training. Upon the in-switch accelerator, we further reduce the synchronization overhead by conducting on-the-fly gradient aggregation at the granularity of network packets rather than gradient vectors. Moreover, we rethink the distributed RL training algorithms and also propose a hierarchical aggregation mechanism to further increase the parallelism and scalability of the distributed RL training at rack scale. We implement iSwitch using a real-world programmable switch NetFPGA board. We extend the control and data plane of the programmable switch to support iSwitch without affecting its regular network functions. Compared with state-of-the-art distributed training approaches, iSwitch offers a system-level speedup of up to 3.66× for synchronous distributed training and 3.71× for asynchronous distributed training, while achieving better scalability.

Original languageEnglish (US)
Title of host publicationISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages279-291
Number of pages13
ISBN (Electronic)9781450366694
DOIs
StatePublished - Jun 22 2019
Event46th International Symposium on Computer Architecture, ISCA 2019 - Phoenix, United States
Duration: Jun 22 2019Jun 26 2019

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Conference

Conference46th International Symposium on Computer Architecture, ISCA 2019
Country/TerritoryUnited States
CityPhoenix
Period6/22/196/26/19

Keywords

  • Distributed machine learning
  • In-network computing
  • In-switch accelerator
  • Reinforcement learning

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Accelerating distributed reinforcement learning with in-switch computing'. Together they form a unique fingerprint.

Cite this