Best arm identification in linear bandits with linear dimension dependency

Chao Tao, Saul A. Blanco, Yuan Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector 0, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized 9 estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages7773-7786
Number of pages14
ISBN (Electronic)9781510867963
StatePublished - Jan 1 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume11

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

Fingerprint

Experiments

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Cite this

Tao, C., Blanco, S. A., & Zhou, Y. (2018). Best arm identification in linear bandits with linear dimension dependency. In A. Krause, & J. Dy (Eds.), 35th International Conference on Machine Learning, ICML 2018 (pp. 7773-7786). (35th International Conference on Machine Learning, ICML 2018; Vol. 11). International Machine Learning Society (IMLS).

Best arm identification in linear bandits with linear dimension dependency. / Tao, Chao; Blanco, Saul A.; Zhou, Yuan.

35th International Conference on Machine Learning, ICML 2018. ed. / Andreas Krause; Jennifer Dy. International Machine Learning Society (IMLS), 2018. p. 7773-7786 (35th International Conference on Machine Learning, ICML 2018; Vol. 11).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tao, C, Blanco, SA & Zhou, Y 2018, Best arm identification in linear bandits with linear dimension dependency. in A Krause & J Dy (eds), 35th International Conference on Machine Learning, ICML 2018. 35th International Conference on Machine Learning, ICML 2018, vol. 11, International Machine Learning Society (IMLS), pp. 7773-7786, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 7/10/18.
Tao C, Blanco SA, Zhou Y. Best arm identification in linear bandits with linear dimension dependency. In Krause A, Dy J, editors, 35th International Conference on Machine Learning, ICML 2018. International Machine Learning Society (IMLS). 2018. p. 7773-7786. (35th International Conference on Machine Learning, ICML 2018).
Tao, Chao ; Blanco, Saul A. ; Zhou, Yuan. / Best arm identification in linear bandits with linear dimension dependency. 35th International Conference on Machine Learning, ICML 2018. editor / Andreas Krause ; Jennifer Dy. International Machine Learning Society (IMLS), 2018. pp. 7773-7786 (35th International Conference on Machine Learning, ICML 2018).
@inproceedings{ebac4bac67fe42788a923e9e9584ca09,
title = "Best arm identification in linear bandits with linear dimension dependency",
abstract = "We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector 0, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized 9 estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.",
author = "Chao Tao and Blanco, {Saul A.} and Yuan Zhou",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
series = "35th International Conference on Machine Learning, ICML 2018",
publisher = "International Machine Learning Society (IMLS)",
pages = "7773--7786",
editor = "Andreas Krause and Jennifer Dy",
booktitle = "35th International Conference on Machine Learning, ICML 2018",

}

TY - GEN

T1 - Best arm identification in linear bandits with linear dimension dependency

AU - Tao, Chao

AU - Blanco, Saul A.

AU - Zhou, Yuan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector 0, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized 9 estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

AB - We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector 0, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized 9 estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

UR - http://www.scopus.com/inward/record.url?scp=85057329886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057329886&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85057329886

T3 - 35th International Conference on Machine Learning, ICML 2018

SP - 7773

EP - 7786

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Krause, Andreas

A2 - Dy, Jennifer

PB - International Machine Learning Society (IMLS)

ER -