Pessimistic Off-Policy Multi-Objective Optimization

Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.
Original languageEnglish (US)
Title of host publicationProceedings of The 27th International Conference on Artificial Intelligence and Statistics
EditorsSanjoy Dasgupta, Stephan Mandt, Yingzhen Li
PublisherPMLR
Pages2980-2988
Number of pages9
Volume238
StatePublished - Feb 1 2024
Externally publishedYes

Publication series

NameProceedings of Machine Learning Research

Fingerprint

Dive into the research topics of 'Pessimistic Off-Policy Multi-Objective Optimization'. Together they form a unique fingerprint.

Cite this