Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing

Yongbo Yu, Fuxun Yu, Zirui Xu, Di Wang, Minjia Zhang, Ang Li, Shawn Bray, Chenchen Liu, Xiang Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL coordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource"could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.

Original languageEnglish (US)
Title of host publicationWWW 2022 - Companion Proceedings of the Web Conference 2022
PublisherAssociation for Computing Machinery
Pages567-571
Number of pages5
ISBN (Electronic)9781450391306
DOIs
StatePublished - Apr 25 2022
Externally publishedYes
Event31st ACM Web Conference, WWW 2022 - Virtual, Online, France
Duration: Apr 25 2022 → …

Publication series

NameWWW 2022 - Companion Proceedings of the Web Conference 2022

Conference

Conference31st ACM Web Conference, WWW 2022
Country/TerritoryFrance
CityVirtual, Online
Period4/25/22 → …

Keywords

  • Federated Learning
  • GPU Resource Allocation
  • Multi-Task Learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing'. Together they form a unique fingerprint.

Cite this