TY - GEN
T1 - Auxo
T2 - 14th ACM Symposium on Cloud Computing, SoCC 2023
AU - Liu, Jiachen
AU - Lai, Fan
AU - Dai, Yinwei
AU - Akella, Aditya
AU - Madhyastha, Harsha V.
AU - Chowdhury, Mosharaf
N1 - We are grateful to the anonymous reviewers, our shepherd Matthias Boehm and SymbioticLab members for their valuable comments and suggestions that improved the paper. We thank the CloudLab team for providing GPU servers for Auxo experiments. This work was supported in part by NSF grant CNS-2106184 and a grant from Cisco.
PY - 2023/10/30
Y1 - 2023/10/30
N2 - Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID). Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from client heterogeneity. In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts). We propose Auxo to gradually identify such cohorts in large-scale, low-availability, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. Our extensive evaluations show that, by identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, Auxo boosts various existing FL solutions in terms of final accuracy (2.1%–8.2%), convergence time (up to 2.2×), and model bias (4.8% - 53.8%).
AB - Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID). Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from client heterogeneity. In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts). We propose Auxo to gradually identify such cohorts in large-scale, low-availability, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. Our extensive evaluations show that, by identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, Auxo boosts various existing FL solutions in terms of final accuracy (2.1%–8.2%), convergence time (up to 2.2×), and model bias (4.8% - 53.8%).
KW - Federated Learning
KW - Unsupervised Learning
UR - https://www.scopus.com/pages/publications/85178505879
UR - https://www.scopus.com/pages/publications/85178505879#tab=citedBy
U2 - 10.1145/3620678.3624651
DO - 10.1145/3620678.3624651
M3 - Conference contribution
AN - SCOPUS:85178505879
T3 - SoCC 2023 - Proceedings of the 2023 ACM Symposium on Cloud Computing
SP - 125
EP - 141
BT - SoCC 2023 - Proceedings of the 2023 ACM Symposium on Cloud Computing
PB - Association for Computing Machinery
Y2 - 30 October 2023 through 1 November 2023
ER -