Practical Online Reinforcement Learning for Microprocessors with Micro-Armed Bandit

Gerasimos Gerogiannis, Josep Torrellas

Research output: Contribution to journalArticlepeer-review

Abstract

Although online reinforcement learning (RL) has shown promise for microarchitecture decision making, processor vendors are still reluctant to adopt it. There are two main reasons that make RL-based solutions unattractive. First, they have high complexity and storage overhead. Second, many RL agents are engineered for a specific problem and are not reusable. In this work, we propose a way to tackle these shortcomings. We find that, in diverse microarchitecture problems, only a few actions are useful in a given time window. Motivated by this property, we design Micro-Armed Bandit (or Bandit for short), an RL agent that is based on the low-complexity Multi-Armed Bandit algorithms. We show that Bandit can match or exceed the performance of more complex RL and non-RL alternatives in two different problems: data prefetching and instruction fetch thread selection in simultaneous multithreaded processors. We believe that Bandit's simplicity, reusability, and small storage overhead make online RL more practical for microarchitecture.

Original languageEnglish (US)
Pages (from-to)80-87
Number of pages8
JournalIEEE Micro
Volume44
Issue number4
DOIs
StatePublished - 2024

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Practical Online Reinforcement Learning for Microprocessors with Micro-Armed Bandit'. Together they form a unique fingerprint.

Cite this