TY - JOUR
T1 - Practical Online Reinforcement Learning for Microprocessors with Micro-Armed Bandit
AU - Gerogiannis, Gerasimos
AU - Torrellas, Josep
N1 - Publisher Copyright:
© 1981-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Although online reinforcement learning (RL) has shown promise for microarchitecture decision making, processor vendors are still reluctant to adopt it. There are two main reasons that make RL-based solutions unattractive. First, they have high complexity and storage overhead. Second, many RL agents are engineered for a specific problem and are not reusable. In this work, we propose a way to tackle these shortcomings. We find that, in diverse microarchitecture problems, only a few actions are useful in a given time window. Motivated by this property, we design Micro-Armed Bandit (or Bandit for short), an RL agent that is based on the low-complexity Multi-Armed Bandit algorithms. We show that Bandit can match or exceed the performance of more complex RL and non-RL alternatives in two different problems: data prefetching and instruction fetch thread selection in simultaneous multithreaded processors. We believe that Bandit's simplicity, reusability, and small storage overhead make online RL more practical for microarchitecture.
AB - Although online reinforcement learning (RL) has shown promise for microarchitecture decision making, processor vendors are still reluctant to adopt it. There are two main reasons that make RL-based solutions unattractive. First, they have high complexity and storage overhead. Second, many RL agents are engineered for a specific problem and are not reusable. In this work, we propose a way to tackle these shortcomings. We find that, in diverse microarchitecture problems, only a few actions are useful in a given time window. Motivated by this property, we design Micro-Armed Bandit (or Bandit for short), an RL agent that is based on the low-complexity Multi-Armed Bandit algorithms. We show that Bandit can match or exceed the performance of more complex RL and non-RL alternatives in two different problems: data prefetching and instruction fetch thread selection in simultaneous multithreaded processors. We believe that Bandit's simplicity, reusability, and small storage overhead make online RL more practical for microarchitecture.
UR - http://www.scopus.com/inward/record.url?scp=85195415633&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195415633&partnerID=8YFLogxK
U2 - 10.1109/MM.2024.3408719
DO - 10.1109/MM.2024.3408719
M3 - Article
AN - SCOPUS:85195415633
SN - 0272-1732
VL - 44
SP - 80
EP - 87
JO - IEEE Micro
JF - IEEE Micro
IS - 4
ER -