TY - GEN
T1 - Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing
AU - Yu, Yongbo
AU - Yu, Fuxun
AU - Xu, Zirui
AU - Wang, Di
AU - Zhang, Minjia
AU - Li, Ang
AU - Bray, Shawn
AU - Liu, Chenchen
AU - Chen, Xiang
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL coordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource"could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.
AB - Federated learning (FL) nowadays involves compound learning tasks as cognitive applications' complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks' skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL coordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of "virtual resource"could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated.
KW - Federated Learning
KW - GPU Resource Allocation
KW - Multi-Task Learning
UR - http://www.scopus.com/inward/record.url?scp=85137507405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137507405&partnerID=8YFLogxK
U2 - 10.1145/3487553.3524859
DO - 10.1145/3487553.3524859
M3 - Conference contribution
AN - SCOPUS:85137507405
T3 - WWW 2022 - Companion Proceedings of the Web Conference 2022
SP - 567
EP - 571
BT - WWW 2022 - Companion Proceedings of the Web Conference 2022
PB - Association for Computing Machinery
T2 - 31st ACM Web Conference, WWW 2022
Y2 - 25 April 2022
ER -