TY - GEN
T1 - Performance Bounds for Policy-Based Reinforcement Learning Methods in Zero-Sum Markov Games with Linear Function Approximation
AU - Winnicki, Anna
AU - Srikant, R.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Until recently, efficient policy iteration algorithms for zero-sum Markov games that converge were unknown. Therefore, model-based RL algorithms for such problems could not use policy iteration in the planning modules of the algorithms. In an earlier paper, we showed that a convergent policy iteration algorithm can be obtained by using a commonly used technique in RL called lookahead. However, the algorithm could be applied to the function approximation setting only in the special case of linear MDPs (Markov Decision Processes). In this paper, we obtain performance bounds for policy-based RL algorithms for general settings, including one where policy evaluation is performed using noisy samples of (state, action, reward) triplets from a single sample path of a given policy.
AB - Until recently, efficient policy iteration algorithms for zero-sum Markov games that converge were unknown. Therefore, model-based RL algorithms for such problems could not use policy iteration in the planning modules of the algorithms. In an earlier paper, we showed that a convergent policy iteration algorithm can be obtained by using a commonly used technique in RL called lookahead. However, the algorithm could be applied to the function approximation setting only in the special case of linear MDPs (Markov Decision Processes). In this paper, we obtain performance bounds for policy-based RL algorithms for general settings, including one where policy evaluation is performed using noisy samples of (state, action, reward) triplets from a single sample path of a given policy.
UR - http://www.scopus.com/inward/record.url?scp=85184822893&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184822893&partnerID=8YFLogxK
U2 - 10.1109/CDC49753.2023.10384061
DO - 10.1109/CDC49753.2023.10384061
M3 - Conference contribution
AN - SCOPUS:85184822893
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 7144
EP - 7149
BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 62nd IEEE Conference on Decision and Control, CDC 2023
Y2 - 13 December 2023 through 15 December 2023
ER -