Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Tanmay Gangwani, Jian Peng, Yuan Zhou

Research output: Contribution to journalConference articlepeer-review

Abstract

Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on f-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.

Original languageEnglish (US)
Pages (from-to)2206-2215
Number of pages10
JournalProceedings of Machine Learning Research
Volume155
StatePublished - 2020
Event4th Conference on Robot Learning, CoRL 2020 - Virtual, Online, United States
Duration: Nov 16 2020Nov 18 2020

Keywords

  • Exploration-Exploitation
  • Quality-Diversity
  • Reinforcement Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity'. Together they form a unique fingerprint.

Cite this