TY - GEN
T1 - Learning and Selecting the Right Customers for Reliability
T2 - 57th IEEE Conference on Decision and Control, CDC 2018
AU - Li, Yingying
AU - Hu, Qinran
AU - Li, Na
N1 - Funding Information:
The work was supported by NSF 1608509, NSF CAREER 1553407, AFOSR YIP, and ARPA-E through the NODES program. Y. Li, Q. Hu, and N. Li are with the School of Engineering and Applied Sciences, Harvard University, 33 Oxford Street, Cambridge, MA 02138, USA (email: [email protected], [email protected], [email protected]).
Funding Information:
The work was supported by NSF 1608509, NSF CAREER 1553407
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - In this paper, we consider residential demand response (DR) programs where an aggregator calls upon some residential customers to change their demand so that the total load adjustment is as close to a target value as possible. Major challenges lie in the uncertainty and randomness of the customer behaviors in response to DR signals, and the limited knowledge available to the aggregator of the customers. To learn and select the right customers, we formulate the DR problem as a combinatorial multi-armed bandit (CMAB) problem with a reliability goal. We propose a learning algorithm: CUCB-Avg (Combinatorial Upper Confidence Bound-Average), which utilizes both upper confidence bounds and sample averages to balance the tradeoff between exploration (learning) and exploitation (selecting). We prove that CUCB-Avg achieves O(log T) regret given a time-invariant target. Simulation results demonstrate that our CUCB-Avg performs significantly better than the classic algorithm CUCB (Combinatorial Upper Confidence Bound).
AB - In this paper, we consider residential demand response (DR) programs where an aggregator calls upon some residential customers to change their demand so that the total load adjustment is as close to a target value as possible. Major challenges lie in the uncertainty and randomness of the customer behaviors in response to DR signals, and the limited knowledge available to the aggregator of the customers. To learn and select the right customers, we formulate the DR problem as a combinatorial multi-armed bandit (CMAB) problem with a reliability goal. We propose a learning algorithm: CUCB-Avg (Combinatorial Upper Confidence Bound-Average), which utilizes both upper confidence bounds and sample averages to balance the tradeoff between exploration (learning) and exploitation (selecting). We prove that CUCB-Avg achieves O(log T) regret given a time-invariant target. Simulation results demonstrate that our CUCB-Avg performs significantly better than the classic algorithm CUCB (Combinatorial Upper Confidence Bound).
UR - http://www.scopus.com/inward/record.url?scp=85062190915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062190915&partnerID=8YFLogxK
U2 - 10.1109/CDC.2018.8619481
DO - 10.1109/CDC.2018.8619481
M3 - Conference contribution
AN - SCOPUS:85062190915
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 4869
EP - 4874
BT - 2018 IEEE Conference on Decision and Control, CDC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 December 2018 through 19 December 2018
ER -