TY - GEN
T1 - Structured linear contextual bandits
T2 - 37th International Conference on Machine Learning, ICML 2020
AU - Sivakumar, Vidyashankar
AU - Wu, Zhiwei Steven
AU - Banerjee, Arindam
N1 - Publisher Copyright:
© 2020 37th International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- A nd multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for ? with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of ?. We also obtain sharper regret bounds compared to earlier work for the unstructured setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.
AB - Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- A nd multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for ? with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of ?. We also obtain sharper regret bounds compared to earlier work for the unstructured setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.
UR - http://www.scopus.com/inward/record.url?scp=85105301171&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105301171&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105301171
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 8973
EP - 8982
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
Y2 - 13 July 2020 through 18 July 2020
ER -