TY - GEN
T1 - Robust multi-agent multi-armed bandits
AU - Vial, Daniel
AU - Shakkottai, Sanjay
AU - Srikant, R.
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/26
Y1 - 2021/7/26
N2 - Recent works have shown that agents facing independent instances of a stochastic K-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include n honest and m malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming m is small compared to K but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.
AB - Recent works have shown that agents facing independent instances of a stochastic K-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include n honest and m malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming m is small compared to K but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.
KW - Adversarial agents
KW - Multi-agent systems
KW - Multi-armed bandits
UR - http://www.scopus.com/inward/record.url?scp=85121659429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121659429&partnerID=8YFLogxK
U2 - 10.1145/3466772.3467045
DO - 10.1145/3466772.3467045
M3 - Conference contribution
AN - SCOPUS:85121659429
T3 - Proceedings of the International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc)
SP - 161
EP - 170
BT - MobiHoc 2021 - Proceedings of the 2021 22nd International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
PB - Association for Computing Machinery
T2 - 22nd International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, MobiHoc 2021
Y2 - 26 July 2021 through 29 July 2021
ER -