TY - JOUR
T1 - Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
AU - Li, Ke
AU - Yang, Yun
AU - Narisetty, Naveen N.
N1 - Publisher Copyright:
© 2021, Institute of Mathematical Statistics. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, O( (log d)α+1 2 T1−α 2 + log T), for the cumulative regret, in terms of horizon T, dimension d and a margin parameter α ∈ [0, 1], which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter α explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered ℓ1-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
AB - In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, O( (log d)α+1 2 T1−α 2 + log T), for the cumulative regret, in terms of horizon T, dimension d and a margin parameter α ∈ [0, 1], which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter α explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered ℓ1-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
KW - Contextual linear bandit
KW - high-dimension
KW - minimax regret
KW - sparsity
KW - upper confidence bound
UR - http://www.scopus.com/inward/record.url?scp=85142845975&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142845975&partnerID=8YFLogxK
U2 - 10.1214/21-EJS1909
DO - 10.1214/21-EJS1909
M3 - Article
AN - SCOPUS:85142845975
SN - 1935-7524
VL - 15
SP - 5652
EP - 5695
JO - Electronic Journal of Statistics
JF - Electronic Journal of Statistics
IS - 2
ER -