Learning and Selecting the Right Customers for Reliability: A Multi-Armed Bandit Approach

Yingying Li, Qinran Hu, Na Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we consider residential demand response (DR) programs where an aggregator calls upon some residential customers to change their demand so that the total load adjustment is as close to a target value as possible. Major challenges lie in the uncertainty and randomness of the customer behaviors in response to DR signals, and the limited knowledge available to the aggregator of the customers. To learn and select the right customers, we formulate the DR problem as a combinatorial multi-armed bandit (CMAB) problem with a reliability goal. We propose a learning algorithm: CUCB-Avg (Combinatorial Upper Confidence Bound-Average), which utilizes both upper confidence bounds and sample averages to balance the tradeoff between exploration (learning) and exploitation (selecting). We prove that CUCB-Avg achieves O(log T) regret given a time-invariant target. Simulation results demonstrate that our CUCB-Avg performs significantly better than the classic algorithm CUCB (Combinatorial Upper Confidence Bound).

Original languageEnglish (US)
Title of host publication2018 IEEE Conference on Decision and Control, CDC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4869-4874
Number of pages6
ISBN (Electronic)9781538613955
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event57th IEEE Conference on Decision and Control, CDC 2018 - Miami, United States
Duration: Dec 17 2018Dec 19 2018

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2018-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference57th IEEE Conference on Decision and Control, CDC 2018
Country/TerritoryUnited States
CityMiami
Period12/17/1812/19/18

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Learning and Selecting the Right Customers for Reliability: A Multi-Armed Bandit Approach'. Together they form a unique fingerprint.

Cite this