TY - GEN
T1 - Meta Clustering of Neural Bandits
AU - Ban, Yikun
AU - Qi, Yunzhe
AU - Wei, Tianxin
AU - Liu, Lihui
AU - He, Jingrui
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/8/25
Y1 - 2024/8/25
N2 - The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of T rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.
AB - The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of T rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.
KW - meta learning
KW - neural contextual bandits
KW - recommendation
KW - user modeling
UR - http://www.scopus.com/inward/record.url?scp=85203707875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203707875&partnerID=8YFLogxK
U2 - 10.1145/3637528.3671691
DO - 10.1145/3637528.3671691
M3 - Conference contribution
AN - SCOPUS:85203707875
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 95
EP - 106
BT - KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Y2 - 25 August 2024 through 29 August 2024
ER -