TY - GEN
T1 - Meta Clustering of Neural Bandits
AU - Ban, Yikun
AU - Qi, Yunzhe
AU - Wei, Tianxin
AU - Liu, Lihui
AU - He, Jingrui
N1 - This work is supported by National Science Foundation under Award No. IIS-2002540, and Agriculture and Food Research Initiative (AFRI) grant no. 2020-67021-32799/project accession no.1024178 from the USDA National Institute of Food and Agriculture. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.
PY - 2024/8/24
Y1 - 2024/8/24
N2 - The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of T rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.
AB - The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of T rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.
KW - meta learning
KW - neural contextual bandits
KW - recommendation
KW - user modeling
UR - http://www.scopus.com/inward/record.url?scp=85203707875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203707875&partnerID=8YFLogxK
U2 - 10.1145/3637528.3671691
DO - 10.1145/3637528.3671691
M3 - Conference contribution
AN - SCOPUS:85203707875
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 95
EP - 106
BT - KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Y2 - 25 August 2024 through 29 August 2024
ER -