TY - GEN
T1 - EnsembleDAgger
T2 - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
AU - Menda, Kunal
AU - Driggs-Campbell, Katherine
AU - Kochenderfer, Mykel J.
N1 - Funding Information:
ACKNOWLEDGMENTS This material is based upon work supported by SAIC Innovation Center, a subsidiary of SAIC Motors, as well as by AFRL and DARPA under contract FA8750-18-C-0099. The authors would like to acknowledge the useful feedback of Apoorva Sharma and Michael Kelly.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Although imitation learning is often used in robotics, the approach frequently suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which attempts to quantity the confidence of the novice policy as a proxy for safety. Our method, EnsembleDAgger, approximates a Gaussian Process using an ensemble of neural networks. Using the variance as a measure of confidence, we compute a decision rule that captures how much we doubt the novice, thus determining when it is safe to allow the novice to act. With this approach, we aim to maximize the novice's share of actions, while constraining the probability of failure. We demonstrate improved safety and learning performance compared to other DAgger variants and classic imitation learning on an inverted pendulum and in the MuJoCo HalfCheetah environment.
AB - Although imitation learning is often used in robotics, the approach frequently suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which attempts to quantity the confidence of the novice policy as a proxy for safety. Our method, EnsembleDAgger, approximates a Gaussian Process using an ensemble of neural networks. Using the variance as a measure of confidence, we compute a decision rule that captures how much we doubt the novice, thus determining when it is safe to allow the novice to act. With this approach, we aim to maximize the novice's share of actions, while constraining the probability of failure. We demonstrate improved safety and learning performance compared to other DAgger variants and classic imitation learning on an inverted pendulum and in the MuJoCo HalfCheetah environment.
UR - http://www.scopus.com/inward/record.url?scp=85081168231&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081168231&partnerID=8YFLogxK
U2 - 10.1109/IROS40897.2019.8968287
DO - 10.1109/IROS40897.2019.8968287
M3 - Conference contribution
AN - SCOPUS:85081168231
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 5041
EP - 5048
BT - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 November 2019 through 8 November 2019
ER -