TY - GEN
T1 - Adaptive Trajectory Database Learning for Nonlinear Control with Hybrid Gradient Optimization
AU - Tseng, Kuan Yu
AU - Zhang, Mengchao
AU - Hauser, Kris
AU - Dullerud, Geir E.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper presents a novel experience-based technique, called EHGO, for sample-efficient adaptive control of nonlinear systems in the presence of dynamical modeling errors. The starting point for EHGO is a database seeded with many trajectories optimized under a reference estimate of real system dynamics. When executed on the real system, these trajectories will be suboptimal due to errors in the reference dynamics. The approach then leverages a hybrid gradient optimization technique, GRILC, which observes executed trajectories and computes gradients from the reference model to refine the control policy without requiring an explicit model of the real system. In past work, GRILC was applied in a restrictive setting in which a robot executes multiple rollouts from identical start states. In this paper, we show how to leverage a database to enable GRILC to operate across a wide envelope of possible start states in different iterations. The database is used to balance between start state proximity and recentness-of-experience via a learned distance metric to generate good initial guesses. Experiments on three dynamical systems (pendulum, car, drone) show that the proposed approach adapts quickly to online experience even when the reference model has significant errors. In these examples EHGO generates near-optimal solutions within hundreds of epochs of real execution, which can be orders of magnitude more sample efficient than reinforcement learning techniques.
AB - This paper presents a novel experience-based technique, called EHGO, for sample-efficient adaptive control of nonlinear systems in the presence of dynamical modeling errors. The starting point for EHGO is a database seeded with many trajectories optimized under a reference estimate of real system dynamics. When executed on the real system, these trajectories will be suboptimal due to errors in the reference dynamics. The approach then leverages a hybrid gradient optimization technique, GRILC, which observes executed trajectories and computes gradients from the reference model to refine the control policy without requiring an explicit model of the real system. In past work, GRILC was applied in a restrictive setting in which a robot executes multiple rollouts from identical start states. In this paper, we show how to leverage a database to enable GRILC to operate across a wide envelope of possible start states in different iterations. The database is used to balance between start state proximity and recentness-of-experience via a learned distance metric to generate good initial guesses. Experiments on three dynamical systems (pendulum, car, drone) show that the proposed approach adapts quickly to online experience even when the reference model has significant errors. In these examples EHGO generates near-optimal solutions within hundreds of epochs of real execution, which can be orders of magnitude more sample efficient than reinforcement learning techniques.
UR - http://www.scopus.com/inward/record.url?scp=85216462699&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85216462699&partnerID=8YFLogxK
U2 - 10.1109/IROS58592.2024.10801717
DO - 10.1109/IROS58592.2024.10801717
M3 - Conference contribution
AN - SCOPUS:85216462699
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 11969
EP - 11976
BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
Y2 - 14 October 2024 through 18 October 2024
ER -