TY - GEN
T1 - Adaptive optimal control of partially-unknown constrained-input systems using policy iteration with experience replay
AU - Modares, Hamidreza
AU - Lewis, Frank L.
AU - Naghibi-Sistani, Mohammad Bagher
AU - Chowdhary, Girish
AU - Yucelen, Tansel
PY - 2013
Y1 - 2013
N2 - This paper develops an online learning algorithm to find optimal control solutions for partially-unknown continuous-time systems subject to input constraints. The input constraints are encoded into the optimal control problem through a nonquadratic performance functional. An online policy iteration algorithm that uses integral reinforcement knowledge is developed to learn the solution to the optimal control problem online without knowing the full dynamics model. The policy iteration algorithm is implemented on an actor-critic structure, where two neural network approximators are tuned online and simultaneously to generate the optimal control law. A novel technique based on experience replay is introduced to retain past data in updating the neural network weights. This uses the recorded data concurrently with current data for adaptation of the critic neural network weights. Concurrent learning provides an easy-to-check real-time condition for persistence of excitation that is sufficient to guarantee convergence to a near optimal control law. Stability of the proposed feedback control law is shown and its performance is evaluated through simulations.
AB - This paper develops an online learning algorithm to find optimal control solutions for partially-unknown continuous-time systems subject to input constraints. The input constraints are encoded into the optimal control problem through a nonquadratic performance functional. An online policy iteration algorithm that uses integral reinforcement knowledge is developed to learn the solution to the optimal control problem online without knowing the full dynamics model. The policy iteration algorithm is implemented on an actor-critic structure, where two neural network approximators are tuned online and simultaneously to generate the optimal control law. A novel technique based on experience replay is introduced to retain past data in updating the neural network weights. This uses the recorded data concurrently with current data for adaptation of the critic neural network weights. Concurrent learning provides an easy-to-check real-time condition for persistence of excitation that is sufficient to guarantee convergence to a near optimal control law. Stability of the proposed feedback control law is shown and its performance is evaluated through simulations.
UR - http://www.scopus.com/inward/record.url?scp=84883680649&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883680649&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84883680649
SN - 9781624102240
T3 - AIAA Guidance, Navigation, and Control (GNC) Conference
BT - AIAA Guidance, Navigation, and Control (GNC) Conference
T2 - AIAA Guidance, Navigation, and Control (GNC) Conference
Y2 - 19 August 2013 through 22 August 2013
ER -