Online decision making with history-average dependent costs

Vijeth Hebbar, Cédric Langbort

Research output: Contribution to journalConference articlepeer-review

Abstract

In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

Original languageEnglish (US)
Pages (from-to)480-491
Number of pages12
JournalProceedings of Machine Learning Research
Volume242
StatePublished - 2024
Externally publishedYes
Event6th Annual Learning for Dynamics and Control Conference, L4DC 2024 - Oxford, United Kingdom
Duration: Jul 15 2024Jul 17 2024

Keywords

  • Online Learning with Constraints
  • Online Optimization with Memory
  • Sequential Decision Making

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Online decision making with history-average dependent costs'. Together they form a unique fingerprint.

Cite this