TY - JOUR
T1 - Robust Multi-Agent Bandits Over Undirected Graphs
AU - Vial, Daniel
AU - Shakkottai, Sanjay
AU - Srikant, R.
N1 - Funding Information:
This work was supported by NSF Grants CCF 22-07547, CCF 19-34986, CNS 21-06801, 2019844, 2112471, and 2107037; ONR Grant N00014-19-1-2566; the Machine Learning Lab (MLL) at UT Austin; and the Wireless Networking and Communications Group (WNCG) Industrial Affiliates Program.
Publisher Copyright:
© 2022 ACM.
PY - 2022/12/8
Y1 - 2022/12/8
N2 - We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ) regret in this setting, where K is the number of arms and Δis the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).
AB - We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ) regret in this setting, where K is the number of arms and Δis the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).
KW - malicious agents
KW - multi-armed bandits
UR - http://www.scopus.com/inward/record.url?scp=85146310287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146310287&partnerID=8YFLogxK
U2 - 10.1145/3570614
DO - 10.1145/3570614
M3 - Article
AN - SCOPUS:85146310287
SN - 2476-1249
VL - 6
JO - Proceedings of the ACM on Measurement and Analysis of Computing Systems
JF - Proceedings of the ACM on Measurement and Analysis of Computing Systems
IS - 3
M1 - 53
ER -