Learning to control renewal processes with bandit feedback

Semih Cayci, Atilla Eryilmaz, Rayadurgam Srikant

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a Karmed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(K log(τ ) + KL) regret. We also prove that the regret under any admissible policy is ω(K log(τ )), which implies that UCB-BwI is order optimal.

Original languageEnglish (US)
Title of host publicationSIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems
PublisherAssociation for Computing Machinery, Inc
Pages41-42
Number of pages2
ISBN (Electronic)9781450366786
DOIs
StatePublished - Jun 20 2019
Event14th Joint Conference of International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2019 and IFIP Performance Conference 2019, SIGMETRICS/Performance 2019 - Phoenix, United States
Duration: Jun 24 2019Jun 28 2019

Publication series

NameSIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

Conference

Conference14th Joint Conference of International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2019 and IFIP Performance Conference 2019, SIGMETRICS/Performance 2019
CountryUnited States
CityPhoenix
Period6/24/196/28/19

Fingerprint

Feedback
Controllers
Learning algorithms
Statistics

Keywords

  • Heavy-tailed distributions
  • Multi-armed bandits
  • Online learning
  • Online scheduling
  • Renewal theory
  • Stochastic knapsack

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Cayci, S., Eryilmaz, A., & Srikant, R. (2019). Learning to control renewal processes with bandit feedback. In SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems (pp. 41-42). (SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems). Association for Computing Machinery, Inc. https://doi.org/10.1145/3309697.3331515

Learning to control renewal processes with bandit feedback. / Cayci, Semih; Eryilmaz, Atilla; Srikant, Rayadurgam.

SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems. Association for Computing Machinery, Inc, 2019. p. 41-42 (SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cayci, S, Eryilmaz, A & Srikant, R 2019, Learning to control renewal processes with bandit feedback. in SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems. SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, Association for Computing Machinery, Inc, pp. 41-42, 14th Joint Conference of International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2019 and IFIP Performance Conference 2019, SIGMETRICS/Performance 2019, Phoenix, United States, 6/24/19. https://doi.org/10.1145/3309697.3331515
Cayci S, Eryilmaz A, Srikant R. Learning to control renewal processes with bandit feedback. In SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems. Association for Computing Machinery, Inc. 2019. p. 41-42. (SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems). https://doi.org/10.1145/3309697.3331515
Cayci, Semih ; Eryilmaz, Atilla ; Srikant, Rayadurgam. / Learning to control renewal processes with bandit feedback. SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems. Association for Computing Machinery, Inc, 2019. pp. 41-42 (SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems).
@inproceedings{d3d3c7fb27864ac0a30cbb225e78a830,
title = "Learning to control renewal processes with bandit feedback",
abstract = "We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a Karmed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(K log(τ ) + KL) regret. We also prove that the regret under any admissible policy is ω(K log(τ )), which implies that UCB-BwI is order optimal.",
keywords = "Heavy-tailed distributions, Multi-armed bandits, Online learning, Online scheduling, Renewal theory, Stochastic knapsack",
author = "Semih Cayci and Atilla Eryilmaz and Rayadurgam Srikant",
year = "2019",
month = "6",
day = "20",
doi = "10.1145/3309697.3331515",
language = "English (US)",
series = "SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems",
publisher = "Association for Computing Machinery, Inc",
pages = "41--42",
booktitle = "SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems",

}

TY - GEN

T1 - Learning to control renewal processes with bandit feedback

AU - Cayci, Semih

AU - Eryilmaz, Atilla

AU - Srikant, Rayadurgam

PY - 2019/6/20

Y1 - 2019/6/20

N2 - We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a Karmed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(K log(τ ) + KL) regret. We also prove that the regret under any admissible policy is ω(K log(τ )), which implies that UCB-BwI is order optimal.

AB - We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a Karmed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(K log(τ ) + KL) regret. We also prove that the regret under any admissible policy is ω(K log(τ )), which implies that UCB-BwI is order optimal.

KW - Heavy-tailed distributions

KW - Multi-armed bandits

KW - Online learning

KW - Online scheduling

KW - Renewal theory

KW - Stochastic knapsack

UR - http://www.scopus.com/inward/record.url?scp=85069189480&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069189480&partnerID=8YFLogxK

U2 - 10.1145/3309697.3331515

DO - 10.1145/3309697.3331515

M3 - Conference contribution

AN - SCOPUS:85069189480

T3 - SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

SP - 41

EP - 42

BT - SIGMETRICS Performance 2019 - Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

PB - Association for Computing Machinery, Inc

ER -