TY - GEN
T1 - Robust Multi-Agent Bandits over Undirected Graphs
AU - Vial, Daniel
AU - Shakkottai, Sanjay
AU - Srikant, R.
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/6/19
Y1 - 2023/6/19
N2 - We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) log (T) / ") regret in this setting, where K is the number of arms and Δis the arm gap. For m << K, this improves over the single-agent baseline regret of O(K log(T)/"). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we provide an instance for which honest agents using the state-of-the-art algorithm suffer (nearly) linear regret until time is doubly exponential in n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O((dmal (i) + K/n) log(T)/") on any connected and undirected graph, where dmal (i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph and show the effect of malicious agents is entirely local.
AB - We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) log (T) / ") regret in this setting, where K is the number of arms and Δis the arm gap. For m << K, this improves over the single-agent baseline regret of O(K log(T)/"). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we provide an instance for which honest agents using the state-of-the-art algorithm suffer (nearly) linear regret until time is doubly exponential in n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O((dmal (i) + K/n) log(T)/") on any connected and undirected graph, where dmal (i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph and show the effect of malicious agents is entirely local.
KW - malicious agents
KW - multi-armed bandits
UR - http://www.scopus.com/inward/record.url?scp=85163786437&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163786437&partnerID=8YFLogxK
U2 - 10.1145/3578338.3593567
DO - 10.1145/3578338.3593567
M3 - Conference contribution
AN - SCOPUS:85163786437
T3 - SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
SP - 67
EP - 68
BT - SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
PB - Association for Computing Machinery
T2 - 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2023
Y2 - 19 June 2023 through 23 June 2023
ER -